pandas数据框常用操作

阅读量: searchstar 2021-08-21 00:36:44
Categories: Tags:

pandas官方文档:https://pandas.pydata.org/docs/reference/

DataFrame官方文档:https://pandas.pydata.org/docs/reference/frame.html

新建:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

pandas判断数据框是否相等 pandas数据框获取行数列数 python dataframe根据列号取出列

添加新列:https://www.geeksforgeeks.org/adding-new-column-to-existing-dataframe-in-pandas/

从stdin读取

直接把sys.stdin当file输入进去即可:

latencies = pd.read_table(sys.stdin, names=['operation', 'latency(ns)'], delim_whitespace=True)

来源:https://stackoverflow.com/questions/18495846/pandas-data-from-stdin

添加新行

https://pandas.pydata.org/docs/reference/api/pandas.concat.html#pandas.concat

注意,append已经被deprecated了:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html

如果需要把Series作为行添加到DataFrame里,需要先将其转换成DataFrame,再转置:

r = pd.Series([1, 2, 3], index = ['col1', 'col2', 'col3'])
d = pd.DataFrame({'col1': [0, 1], 'col2': [2, 3], 'col3': [4, 5]})
pd.concat([d, pd.DataFrame(r).T])

取出指定范围的行

与python自带的list的语法类似:

test = pd.DataFrame({'col1': range(0, 10), 'col2': range(10, 20)})
# 取出第2行到第4行:
test[1:4]

取出多列

test = pd.DataFrame({'col1': [0, 1], 'col2': [2, 3], 'col3': [4, 5]})
# 取出col2和col1列
test[['col2', 'col1']]

输出:

   col2  col1
0     2     0
1     3     1

求均值

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html

test = pd.DataFrame({'col1': [0, 1, 2, 3], 'col2': [4, 5, 6, 7]})
# 求每列的平均数
test.mean()
test.mean(axis=0)
# 求每行的平均数
test.mean(axis=1)

groupby

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

每3行分组并求均值

test = pd.DataFrame({'col1': range(0, 10), 'col2': range(10, 20)})
test.groupby(test.index // 3).mean()

输出:

   col1  col2
0   1.0  11.0
1   4.0  14.0
2   7.0  17.0
3   9.0  19.0

将某列的值相同的合并成一个list

https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby

遍历

https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas