澳门威利斯人_威利斯人娱乐「手机版」

来自 威利斯人娱乐 2019-06-08 07:15 的文章
当前位置: 澳门威利斯人 > 威利斯人娱乐 > 正文

Python数据操作强大库,对pandas中to_dict的用法详解

简介:pandas 中的to_dict 能够对DataFrame类型的多少举办改变

我们在管理后台数据时最平日利用的组织正是二维表格的款型,而DataFrame工具得以协理大家非常的慢管理表格数据,能够以行为单位管理整行数据,也可以以列为单位拍卖1列的数量,是二个相比灵活处理数据的工具,在多少管理上有相当的大的实用性。

率先大家将会介绍一下pandas的宗旨数据结构。包含数据类型/索引/轴的焦点个性和常用操作。首初始入numpy和pandas。

可以选用三种的转移类型,分别对应于参数 ‘dict', ‘list', ‘series', ‘split', ‘records', ‘index',上边逐一介绍每一种的用法

1.生成多少

脑海中时刻记着:数据之中暗中认可是对齐的,标签和数据里面包车型客车link会一贯存在,除非您分明破环了这种link。

Help on method to_dict in module pandas.core.frame:
to_dict(orient='dict') method of pandas.core.frame.DataFrame instance
 Convert DataFrame to dictionary.
 Parameters
 ----------
 orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
 Determines the type of the values of the dictionary.
 - dict (default) : dict like {column -> {index -> value}}
 - list : dict like {column -> [values]}
 - series : dict like {column -> Series(values)}
 - split : dict like
  {index -> [index], columns -> [columns], data -> [values]}
 - records : list like
  [{column -> value}, ... , {column -> value}]
 - index : dict like {index -> {column -> value}}
  .. versionadded:: 0.17.0
 Abbreviations are allowed. `s` indicates `series` and `sp`
 indicates `split`.
 Returns
 -------
 result : dict like {column -> {index -> value}}

1.1 设置DataFrame的index,columns以及values

大家将会简介一下数据结构。

1、选用参数orient='dict'

例如:

接下来在独家的章节中再精心介绍各类艺术或艺术。

dict也是私下认可的参数,下边的data数据类型为DataFrame结构, 会形成 {column -> {index -> value}}那样的布局的字典,能够当做是一种双重字典结构

data = DataFrame(np.arange(15).reshape(3,5),index=['one','two','three'],columns=['a','b','c','d','e'])

### series

- 单独提取每列的值及其索引,然后组合成贰个字典

就能够转换类似上面包车型客车矩阵:

series是一维的价签数组,内部能够分包别的数据类型(整形,字符串,浮点型,python的靶子等等)这几个轴标签被统称为索引。创立3个series的主旨办法如下

- 再将上述的列属性作为根本字(key),值(values)为上述的字典

澳门威斯尼斯全部网址 1

```

询问艺术为 :data_dict[key1][key2]

一.二 重回的数额为2维列表格式的:

```

- data_dict 为参数采取orient='dict'时的数据名

例如:

data能够是很几体系型:

- key一 为列属性的键值(外层)

获取数据查询后的列表数据后再调换:

一个python的字典

- key二 为内层字典对应的键值

employee_data = [x.to_dict() for x in data.items]

一个ndarray

data 
Out[9]: 
 pclass age embarked   home.dest sex
1086 3rd 31.194181 UNKNOWN   UNKNOWN male
12 1st 31.194181 Cherbourg   Paris, France female
1036 3rd 31.194181 UNKNOWN   UNKNOWN male
833 3rd 32.000000 Southampton Foresvik, Norway Portland, ND male
1108 3rd 31.194181 UNKNOWN   UNKNOWN male
562 2nd 41.000000 Cherbourg   New York, NY male
437 2nd 48.000000 Southampton Somerset / Bernardsville, NJ female
663 3rd 26.000000 Southampton   UNKNOWN male
669 3rd 19.000000 Southampton   England male
507 2nd 31.194181 Southampton  Petworth, Sussex male
In[10]: data_dict=data.to_dict(orient= 'dict')
In[11]: data_dict
Out[11]: 
{'age': {12: 31.19418104265403,
 437: 48.0,
 507: 31.19418104265403,
 562: 41.0,
 663: 26.0,
 669: 19.0,
 833: 32.0,
 1036: 31.19418104265403,
 1086: 31.19418104265403,
 1108: 31.19418104265403},
 'embarked': {12: 'Cherbourg',
 437: 'Southampton',
 507: 'Southampton',
 562: 'Cherbourg',
 663: 'Southampton',
 669: 'Southampton',
 833: 'Southampton',
 1036: 'UNKNOWN',
 1086: 'UNKNOWN',
 1108: 'UNKNOWN'},
 'home.dest': {12: 'Paris, France',
 437: 'Somerset / Bernardsville, NJ',
 507: 'Petworth, Sussex',
 562: 'New York, NY',
 663: 'UNKNOWN',
 669: 'England',
 833: 'Foresvik, Norway Portland, ND',
 1036: 'UNKNOWN',
 1086: 'UNKNOWN',
 1108: 'UNKNOWN'},
 'pclass': {12: '1st',
 437: '2nd',
 507: '2nd',
 562: '2nd',
 663: '3rd',
 669: '3rd',
 833: '3rd',
 1036: '3rd',
 1086: '3rd',
 1108: '3rd'},
 'sex': {12: 'female',
 437: 'female',
 507: 'male',
 562: 'male',
 663: 'male',
 669: 'male',
 833: 'male',
 1036: 'male',
 1086: 'male',
 1108: 'male'}}

df = pd.DataFrame(employee_data)

3个标量

二、当首要字orient=' list' 时

能够对每1列重新命名:df.rename(columns=col_name, inplace=True)

传进去的index是贰个轴标签的列表。因此,依据data的不及会有上面二种状态

和第11中学相比一般,只然则内层产生了叁个列表,结构为{column -> [values]}

对缺省值的填充:df.fillna('', inplace=True)

### from ndarry

询问办法为: data_list[keys][index]

二.询问数据

一旦data是ndarray,index的长短必须和ndarray同样,假诺未有传来索引,会成立多个【0,。。。Len(data)-一】的八个列表

data_澳门威斯尼斯全部网址,list 为首要字orient='list' 时对应的数目名

二.1 获取特定列和行的数额

```

keys 为列属性的键值,如本例中的'age' , ‘embarked'等

a['x'] 那么将会重返columns为x的列

```

index 为整型索引,从0初叶到最终

a[0:3] 则会再次回到前3行的数据

```

In[19]: data_list=data.to_dict(orient='list')
In[20]: data_list
Out[20]: 
{'age': [31.19418104265403,
 31.19418104265403,
 31.19418104265403,
 32.0,
 31.19418104265403,
 41.0,
 48.0,
 26.0,
 19.0,
 31.19418104265403],
 'embarked': ['UNKNOWN',
 'Cherbourg',
 'UNKNOWN',
 'Southampton',
 'UNKNOWN',
 'Cherbourg',
 'Southampton',
 'Southampton',
 'Southampton',
 'Southampton'],
 'home.dest': ['UNKNOWN',
 'Paris, France',
 'UNKNOWN',
 'Foresvik, Norway Portland, ND',
 'UNKNOWN',
 'New York, NY',
 'Somerset / Bernardsville, NJ',
 'UNKNOWN',
 'England',
 'Petworth, Sussex'],
 'pclass': ['3rd',
 '1st',
 '3rd',
 '3rd',
 '3rd',
 '2nd',
 '2nd',
 '3rd',
 '3rd',
 '2nd'],
 'sex': ['male',
 'female',
 'male',
 'male',
 'male',
 'male',
 'female',
 'male',
 'male',
 'male']}

2.二 通过标签来摘取数据

专注,pandas协理非unique索引的值,要是您试着做了三个不支持重复索引值的操作,系统也会报错。那样的原故是依据质量的思虑(总结当中的广大实例,比如Groupby的有的就从未有过选取索引)

叁、关键字参数orient='series'

a.loc['one']则会私下认可表示采用行为'one'的行;

### from dict

多变结构{column -> Series(values)}

a.loc[:,['a','b']] 表示接纳全部的行以及columns为a,b的列

传播的data是1个字典类型,借使传入了index对象,那么字典中对应index的values将被拿出。不然,index暗中认可是字典当中的sorted_keys。

调用格式为:data_series[key1][key2]或data_dict[key1]

a.loc[['one','two'],['a','b']] 表示选择'one'和'two'那两行以及columns为  a,b的列

```

data_series 为多少对应的名字

2.三 直接通过岗位来过滤数据

```

key一 为列属性的键值,如本例中的'age' , ‘embarked'等

a.iloc[1:2,1:2] 则会获得第二行第一列的多寡

```

key二 使用数据原始的目录(可选)

2.四 通过标准来过滤数据

note: NaN 是pandas中通用的缺点和失误值标志

In[21]: data_series=data.to_dict(orient='series')
In[22]: data_series
Out[22]: 
{'age': 1086 31.194181
 12 31.194181
 1036 31.194181
 833 32.000000
 1108 31.194181
 562 41.000000
 437 48.000000
 663 26.000000
 669 19.000000
 507 31.194181
 Name: age, dtype: float64, 'embarked': 1086 UNKNOWN
 12 Cherbourg
 1036 UNKNOWN
 833 Southampton
 1108 UNKNOWN
 562 Cherbourg
 437 Southampton
 663 Southampton
 669 Southampton
 507 Southampton
 Name: embarked, dtype: object, 'home.dest': 1086    UNKNOWN
 12   Paris, France
 1036    UNKNOWN
 833 Foresvik, Norway Portland, ND
 1108    UNKNOWN
 562   New York, NY
 437 Somerset / Bernardsville, NJ
 663    UNKNOWN
 669    England
 507   Petworth, Sussex
 Name: home.dest, dtype: object, 'pclass': 1086 3rd
 12 1st
 1036 3rd
 833 3rd
 1108 3rd
 562 2nd
 437 2nd
 663 3rd
 669 3rd
 507 2nd
 Name: pclass, dtype: object, 'sex': 1086 male
 12 female
 1036 male
 833 male
 1108 male
 562 male
 437 female
 663 male
 669 male
 507 male
 Name: sex, dtype: object}

a[a['one'].isin(['2','3'])] 表呈现知足条件:列one中的值包蕴'2','叁'的富有行(isin()选出特定列中隐含特定值的行)

假如data是二个标量,index必须提供,value会依据index的长度自动重新补全。
```

④、关键字参数orient='split'

叁.数码转换

In [10]: pd.Series(5.,index=['a','b','c','d','e'])Out[10]: a    5.0b    5.0c    5.0d    5.0e    5.0dtype: float64

形成{index -> [index], columns -> [columns], data -> [values]}的结构,是将数据、索引、属性名单身脱离出来构成字典

pandas 中的to_dict 能够对DataFrame类型的数码进行转移

```

调用情势有 data_split[‘index'],data_split[‘data'],data_split[‘columns']

能够采纳多种的调换类型,分别对应于参数 ‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’

### series 是ndarray-like

data_split=data.to_dict(orient='split')
data_split
Out[38]: 
{'columns': ['pclass', 'age', 'embarked', 'home.dest', 'sex'],
 'data': [['3rd', 31.19418104265403, 'UNKNOWN', 'UNKNOWN', 'male'],
 ['1st', 31.19418104265403, 'Cherbourg', 'Paris, France', 'female'],
 ['3rd', 31.19418104265403, 'UNKNOWN', 'UNKNOWN', 'male'],
 ['3rd', 32.0, 'Southampton', 'Foresvik, Norway Portland, ND', 'male'],
 ['3rd', 31.19418104265403, 'UNKNOWN', 'UNKNOWN', 'male'],
 ['2nd', 41.0, 'Cherbourg', 'New York, NY', 'male'],
 ['2nd', 48.0, 'Southampton', 'Somerset / Bernardsville, NJ', 'female'],
 ['3rd', 26.0, 'Southampton', 'UNKNOWN', 'male'],
 ['3rd', 19.0, 'Southampton', 'England', 'male'],
 ['2nd', 31.19418104265403, 'Southampton', 'Petworth, Sussex', 'male']],
 'index': [1086, 12, 1036, 833, 1108, 562, 437, 663, 669, 507]}

3.1 转dict:暗中认可的参数,产生 {column : {index : value}}那样的结构的字典,能够当做是一种双重字典结构, 查询办法为 :data_dict[key1][key2]

series的表现跟ndarray特别像,series也是2个对此好多Numpy 方法使得的参数,同时,index也支撑分片操作。

五、当珍视字orient='records' 时

3.2 转list: 结构为{column : [values]} ,访问格局:data_list[keys][index]

```

形成[{column -> value}, … , {column -> value}]的结构

3.3 转series:结构为{column : Series(values)}

```

完整重组1个列表,内层是将原本数据的每行提收取来形成字典

3.4 转split:结构:{index : [index], columns : [columns], data : [values]}

点击这里查看array-based 索引

调用格式为data_records[index][key1]

三.5 转records:最平日用[{column : value}, … , {column : value}],内层是将   原始数据的每行提收取来产生字典

### series is dict-like

data_records=data.to_dict(orient='records')
data_records
Out[41]: 
[{'age': 31.19418104265403,
 'embarked': 'UNKNOWN',
 'home.dest': 'UNKNOWN',
 'pclass': '3rd',
 'sex': 'male'},
 {'age': 31.19418104265403,
 'embarked': 'Cherbourg',
 'home.dest': 'Paris, France',
 'pclass': '1st',
 'sex': 'female'},
 {'age': 31.19418104265403,
 'embarked': 'UNKNOWN',
 'home.dest': 'UNKNOWN',
 'pclass': '3rd',
 'sex': 'male'},
 {'age': 32.0,
 'embarked': 'Southampton',
 'home.dest': 'Foresvik, Norway Portland, ND',
 'pclass': '3rd',
 'sex': 'male'},
 {'age': 31.19418104265403,
 'embarked': 'UNKNOWN',
 'home.dest': 'UNKNOWN',
 'pclass': '3rd',
 'sex': 'male'},
 {'age': 41.0,
 'embarked': 'Cherbourg',
 'home.dest': 'New York, NY',
 'pclass': '2nd',
 'sex': 'male'},
 {'age': 48.0,
 'embarked': 'Southampton',
 'home.dest': 'Somerset / Bernardsville, NJ',
 'pclass': '2nd',
 'sex': 'female'},
 {'age': 26.0,
 'embarked': 'Southampton',
 'home.dest': 'UNKNOWN',
 'pclass': '3rd',
 'sex': 'male'},
 {'age': 19.0,
 'embarked': 'Southampton',
 'home.dest': 'England',
 'pclass': '3rd',
 'sex': 'male'},
 {'age': 31.19418104265403,
 'embarked': 'Southampton',
 'home.dest': 'Petworth, Sussex',
 'pclass': '2nd',
 'sex': 'male'}]

3.6 转index:结构{index : {column : value}}

series很像2个稳固大小的字典,你能够透过index label去询问和设置值

六、当第贰字orient='index' 时

4.拍卖数量

```

形成{index -> {column -> value}}的组织,调用格式正好和'dict' 对应的扭动,请读者自身思虑

肆.1 对点名轴上的目录举行改动/增添/删除操作,重回原始数据的3个拷贝

```

本文由澳门威利斯人发布于威利斯人娱乐,转载请注明出处:Python数据操作强大库,对pandas中to_dict的用法详解

关键词: 澳门威利斯人 日记本 后台 python热爱者