python – pandas groupby方法实际上是如何工作的？

发布时间：2020-05-27 19:15:05 所属栏目：Python 来源：互联网

导读：所以我试图理解pandas.dataFrame.groupby()函数,我在文档中遇到了这个例子： In [1]: df = pd.DataFrame({A : [foo, bar, foo, bar, ...: foo, bar, foo, foo], ...:

所以我试图理解pandas.dataFrame.groupby()函数,我在文档中遇到了这个例子：

In [1]: df = pd.DataFrame({'A' : ['foo','bar','foo',...:                           'foo','foo'],...:                    'B' : ['one','one','two','three',...:                           'two','three'],...:                    'C' : np.random.randn(8),...:                    'D' : np.random.randn(8)})
   ...: 

In [2]: df
Out[2]: 
     A      B         C         D
0  foo    one  0.469112 -0.861849
1  bar    one -0.282863 -2.104569
2  foo    two -1.509059 -0.494929
3  bar  three -1.135632  1.071804
4  foo    two  1.212112  0.721555
5  bar    two -0.173215 -0.706771
6  foo    one  0.119209 -1.039575
7  foo  three -1.044236  0.271860

不进一步探索我做了这个：

print(df.groupby('B').head())

它输出相同的dataFrame,但是当我这样做时：

print(df.groupby('B'))

它给了我这个：

<pandas.core.groupby.DataFrameGroupBy object at 0x7f65a585b390>

这是什么意思？在普通的dataFrame中,打印.head()只输出前5行所发生的事情？

还有为什么打印.head()会提供与数据帧相同的输出？它不应该按“B”列的元素分组吗？

解决方法

当你使用时

df.groupby('A')

你得到一个groupby object.你还没有应用任何功能.在引擎盖下,虽然这个定义可能不完美,但您可以将groupby对象视为：

>(group,DataFrame)对的迭代器,用于DataFrame,或
> Series(系列)对的迭代器,用于Series.

为了显示：

df = DataFrame({'A' : [1,1,2,2],'B' : [1,3,4]})
grouped = df.groupby('A')

# each `i` is a tuple of (group,DataFrame)
# so your output here will be a little messy
for i in grouped:
    print(i)
(1,A  B
0  1  1
1  1  2)
(2,A  B
2  2  3
3  2  4)

# this version uses multiple counters
# in a single loop.  each `group` is a group,each
# `df` is its corresponding DataFrame
for group,df in grouped:
    print('group of A:',group,'n')
    print(df,'n')
group of A: 1 

   A  B
0  1  1
1  1  2 

group of A: 2 

   A  B
2  2  3
3  2  4 

# and if you just wanted to visualize the groups,# your second counter is a "throwaway"
for group,_ in grouped:
    print('group of A:','n')
group of A: 1 

group of A: 2

现在和.head一样.只需查看docs的方法：

Essentially equivalent to .apply(lambda x: x.head(n))

所以这里你实际上是将一个函数应用于groupby对象的每个组.请记住.head(5)应用于每个组(每个DataFrame),因为每组有少于或等于5行,您将获得原始DataFrame.

请考虑以上示例.如果使用.head(1),则只获得每组的前1行：

print(df.groupby('A').head(1))
   A  B
0  1  1
2  2  3

（编辑：安卓应用网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!