【Pandas】数据透视表函数 pivot_table()

官方文档的描述：

pandas.DataFrame.pivot_table

Create a spreadsheet-style pivot table as a DataFrame.The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

具体函数

DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)[source]
1

参数：

values column to aggregate, optional
当不需要显示全部的列的时候，选择需要展示的列
index column, Grouper, array, or list of the previous
指定的索引，可以是一个列表
columns column, Grouper, array, or list of the previous
columns参数就是用来显示字符型数据的，和fill_value搭配使用
aggfunc function, list of functions, dict, default numpy.mean
处理的方法，默认是 aggfunc='mean' 求均值
fill_value scalar, default None
在聚合之后，空值填什么，默认是NaN
margins bool, default False

Add all row / columns (e.g. for subtotal / grand totals).
dropna bool, default True

Do not include columns whose entries are all NaN.
margins_name str, default ‘All’

Name of the row / column that will contain the totals when margins is True.
observed bool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Returns：
DataFrame
An Excel style pivot table.

举例说明：

导入基本的模块，和创建Dataframe（也可以读文件来得到数据集）：

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'name': ['Leonard', 'Sheldon', 'Raj', 'Howard', 'Sheldon', 'Penny', 'Penny'],
    'item': ['water', 'coke', 'soda', 'wine', 'water', 'coke', 'wine'],
    'num':  [1, 1, 1, 3, 1, 1, 2],
    'price':[2, 3, 4, 15, 2, 3, 10],
    'time': ['2022.8.20', '2022.8.20', '2022.8.21', '2022.8.21', '2022.8.21', '2022.8.20', '2022.8.19'],
    'operator': ['Penny', 'Bernadette', 'Penny', 'Bernadette', 'Penny', 'Bernadette', 'Bernadette'],
    })
1
2
3
4
5
6
7
8
9
10
11

按照姓名汇总：

# 以姓名为索引 只显示数据类型的值, 且默认按聚集再求均值处理
df1 = df.pivot_table(index=['name'])
1
2

多个索引：

# 多索引
df2 = df.pivot_table(index=['operator', 'name'])
1
2

指定显示的列：

# 显然Values不能随便指定, pivot_table()只能显示数值列, 只看价格
df3 = df.pivot_table(index=['operator', 'name'], values=['price'])
1
2

指定处理方法：

# 指定处理方法
df4 = df.pivot_table(index=['operator', 'name'], aggfunc={'num': np.sum, 'price': [np.sum, np.mean]})
1
2

想显示字符类型的列：

# columns显示字符类型
df5 = df.pivot_table(index=['operator', 'name'], values=['price'], columns='item')
1
2

将为空的值填上0：

# fill_value将NaN都填成0
df6 = df.pivot_table(index=['operator', 'name'], values=['price'], columns='item', fill_value=0)
1
2

组合处理方式：

# 组合模式
pd7 = df.pivot_table(index=['operator', 'name', 'item'], values=['num', 'price'],aggfunc=[np.sum], fill_value=0, margins=True)
1
2

此外，一般对DataFrame的方法都可以继续处理，比如筛选行、列等等

相关阅读:
【Axure高保真原型】3D环形图_移入显示数据标签
PLC如何实现二阶滤波器算法(二阶巴特沃斯低通滤波器FIR_Filter)
梁建章：旅行重回全球时代主题构建“创新与传承”大场景
算法基础入门 - 2.栈、队列、链表
ssm大型商场移动导游系统的设计与实现毕业设计源码100853
电商小程序实战教程-首页重构
kong 和konga网关部署及使用
Android OpenGL ES踩坑记录
嵌入式Ubuntu安装Opencv
基于 CNN-GRU 的菇房多点温湿度预测方法研究学习记录

原文地址：https://blog.csdn.net/cwtnice/article/details/126399840