|
对python这个高级语言感兴趣的小伙伴,下面一起跟随脚本之家 jb51.cc的小编两巴掌来看看吧!
习惯上,我们做以下导入
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import matplotlib.pyplot as plt
# End www.jb51.cc
创建对象
使用传递的值列表序列创建序列,让pandas创建默认整数索引
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [4]: s = pd.Series([1,3,5,np.nan,6,8])
In [5]: s
Out[5]:
0 1
1 3
2 5
3 NaN
4 6
5 8
dtype: float64
# End www.jb51.cc
使用传递的numpy数组创建数据帧,并使用日期索引和标记列.
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [6]: dates = pd.date_range('20130101',periods=6)
In [7]: dates
Out[7]: [2013-01-01,...,2013-01-06]
Length: 6,Freq: D,Timezone: None
In [8]: df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
In [9]: df
Out[9]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
2013-01-06 -0.673690 0.113648 -1.478427 0.524988
# End www.jb51.cc
使用传递的可转换序列的字典对象创建数据帧.
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [10]: df2 = pd.DataFrame({ 'A' : 1.,....: 'B' : pd.Timestamp('20130102'),....: 'C' : pd.Series(1,index=list(range(4)),dtype='float32'),....: 'D' : np.array([3] * 4,dtype='int32'),....: 'E' : pd.Categorical(["test","train","test","train"]),....: 'F' : 'foo' })
....:
In [11]: df2
Out[11]:
A B C D E F
0 1 2013-01-02 1 3 test foo
1 1 2013-01-02 1 3 train foo
2 1 2013-01-02 1 3 test foo
3 1 2013-01-02 1 3 train foo
# End www.jb51.cc
所有明确类型
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [12]: df2.dtypes
Out[12]:
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
# End www.jb51.cc
如果你这个正在使用IPython,标签补全列名(以及公共属性)将自动启用。这里是将要完成的属性的子集:
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [13]: df2.
df2.A df2.boxplot
df2.abs df2.C
df2.add df2.clip
df2.add_prefix df2.clip_lower
df2.add_suffix df2.clip_upper
df2.align df2.columns
df2.all df2.combine
df2.any df2.combineAdd
df2.append df2.combine_first
df2.apply df2.combineMult
df2.applymap df2.compound
df2.as_blocks df2.consolidate
df2.asfreq df2.convert_objects
df2.as_matrix df2.copy
df2.astype df2.corr
df2.at df2.corrwith
df2.at_time df2.count
df2.axes df2.cov
df2.B df2.cummax
df2.between_time df2.cummin
df2.bfill df2.cumprod
df2.blocks df2.cumsum
df2.bool df2.D
# End www.jb51.cc
如你所见,列 A,B,C,和 D 也是自动完成标签. E 也是可用的; 为了简便起见,后面的属性显示被截断.
查看数据
参阅基础部分
查看帧顶部和底部行
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [14]: df.head()
Out[14]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
In [15]: df.tail(3)
Out[15]:
A B C D
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
2013-01-06 -0.673690 0.113648 -1.478427 0.524988
# End www.jb51.cc
显示索引,列,和底层numpy数据
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [16]: df.index
Out[16]:
[2013-01-01,Timezone: None
In [17]: df.columns
Out[17]: Index([u'A',u'B',u'C',u'D'],dtype='object')
In [18]: df.values
Out[18]:
array([[ 0.4691,-0.2829,-1.5091,-1.1356],[ 1.2121,-0.1732,0.1192,-1.0442],[-0.8618,-2.1046,-0.4949,1.0718],[ 0.7216,-0.7068,-1.0396,0.2719],[-0.425,0.567,0.2762,-1.0874],[-0.6737,0.1136,-1.4784,0.525 ]])
# End www.jb51.cc
描述显示数据快速统计摘要
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [19]: df.describe()
Out[19]:
A B C D
count 6.000000 6.000000 6.000000 6.000000
mean 0.073711 -0.431125 -0.687758 -0.233103
std 0.843157 0.922818 0.779887 0.973118
min -0.861849 -2.104569 -1.509059 -1.135632
25% -0.611510 -0.600794 -1.368714 -1.076610
50% 0.022070 -0.228039 -0.767252 -0.386188
75% 0.658444 0.041933 -0.034326 0.461706
max 1.212112 0.567020 0.276232 1.071804
# End www.jb51.cc
转置数据
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [20]: df.T
Out[20]:
2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06
A 0.469112 1.212112 -0.861849 0.721555 -0.424972 -0.673690
B -0.282863 -0.173215 -2.104569 -0.706771 0.567020 0.113648
C -1.509059 0.119209 -0.494929 -1.039575 0.276232 -1.478427
D -1.135632 -1.044236 1.071804 0.271860 -1.087401 0.524988
# End www.jb51.cc
按轴排序
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [21]: df.sort_index(axis=1,ascending=False)
Out[21]:
D C B A
2013-01-01 -1.135632 -1.509059 -0.282863 0.469112
2013-01-02 -1.044236 0.119209 -0.173215 1.212112
2013-01-03 1.071804 -0.494929 -2.104569 -0.861849
2013-01-04 0.271860 -1.039575 -0.706771 0.721555
2013-01-05 -1.087401 0.276232 0.567020 -0.424972
2013-01-06 0.524988 -1.478427 0.113648 -0.673690
# End www.jb51.cc
按值排序
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [22]: df.sort(columns='B')
Out[22]:
A B C D
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-06 -0.673690 0.113648 -1.478427 0.524988
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
# End www.jb51.cc
选择器
注释: 标准Python / Numpy表达式可以完成这些互动工作,但在生产代码中,我们推荐使用优化的pandas数据访问方法,.at,.iat,.loc,.iloc 和 .ix.
参阅索引文档 索引和选择数据 and 多索引/高级索引
读取
选择单列,这会产生一个序列,等价df.A
# @param 十分钟搞定pandas
# @author 脚本之家 jb51.cc|www.www.jb51.cc
In [23]: df['A']
Out[23]:
2013-01-01 0.469112
2013-01-02 1.212112
2013-01-03 -0.861849
2013-01-04 0.721555
2013-01-05 -0.424972
2013-01-06 -0.673690
Freq: D,Name: A,dtype: float64
# End www.jb51.cc
(编辑:安卓应用网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|