pandas学习（三） grouping

groupby() 分组函数
.agg() agg函数，它提供基于列的聚合操作。而groupby可以看做是基于行，或者说index的聚合操作
.apply() apply() 使用时，通常放入一个 lambda 函数表达式、或一个函数作为操作运算
div() div() 方法将 DataFrame 中的每个值除以指定的值。

1.数据集-1

在这里插入图片描述

1.1 哪个大洲平均喝更多的啤酒?

drinks.groupby('continent').beer_servings.mean()
1

在这里插入图片描述

1.2 打印每列各大洲的平均酒精消费量

drinks.groupby('continent').mean()
1

在这里插入图片描述

1.3 输出每列每个大洲的酒精消费量中位数

drinks.groupby('continent').median()
1

在这里插入图片描述

1.4 打印烈酒消耗的平均值、最小值和最大值。

drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])
1

在这里插入图片描述

2.数据集-2

在这里插入图片描述

2.1 计算每个职业的平均年龄

users.groupby('occupation').age.mean()
1

在这里插入图片描述

2.2 发现每个职业的男性比例，并将其从最多到最少排序

# create a function
def gender_to_numeric(x):
    if x == 'M':
        return 1
    if x == 'F':
        return 0

# apply() 使用时，通常放入一个 lambda 函数表达式、或一个函数作为操作运算
# apply the function to the gender column and create a new column
users['gender_n'] = users['gender'].apply(gender_to_numeric)


a = users.groupby('occupation').gender_n.sum() / users.occupation.value_counts() * 100 

# sort to the most male 
a.sort_values(ascending = False)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

在这里插入图片描述

2.3 对于职业和性别的每种组合，计算平均年龄

users.groupby(['occupation', 'gender']).age.mean()
1

在这里插入图片描述

2.4 对于每种职业，计算最低和最高年龄

# agg聚合函数,Pandas中可以利用agg()对Series、DataFrame以及groupby()后的结果进行聚合操作。
# agg函数，它提供基于列的聚合操作。而groupby可以看做是基于行，或者说index的聚合操作
users.groupby('occupation').age.agg(['min', 'max'])
1
2
3

在这里插入图片描述

2.5 每个职业的男女比例

# create a data frame and apply count to gender
# 根据'occupation'与'gender'两项做groupby分组，然后根据gender做计数统计
gender_ocup = users.groupby(['occupation', 'gender']).agg({'gender': 'count'})
gender_ocup.head()
1
2
3
4

在这里插入图片描述

# create a DataFrame and apply count for each occupation
occup_count = users.groupby(['occupation']).agg('count')
occup_count.head()
1
2
3

在这里插入图片描述

# divide the gender_ocup per the occup_count and multiply per 100
# div() 方法将 DataFrame 中的每个值除以指定的值。
occup_gender = gender_ocup.div(occup_count, level = "occupation") * 100
occup_gender.head()
1
2
3
4

在这里插入图片描述

# present all rows from the 'gender column'
occup_gender.loc[: , 'gender']
1
2

在这里插入图片描述

3.数据集-3

在这里插入图片描述

3.1 来自Nighthawks的regiment的平均值

regiment[regiment['regiment'] == 'Nighthawks'].groupby('regiment').mean()
1

在这里插入图片描述

3.2 显示按团和公司分组的平均预测试分数，不带分层索引

# 当一个DataFrame有多个索引时，unstack() 这是一个根据索引行列转换的函数
regiment.groupby(['regiment', 'company']).preTestScore.mean().unstack()
1
2

在这里插入图片描述

3.3 迭代一个组并打印来自该团的名称和整个数据

# Group the dataframe by regiment, and for each regiment,
for name, group in regiment.groupby('regiment'):
    # print the name of the regiment
    print(name)
    # print the data of that regiment
    print(group)
1
2
3
4
5
6

在这里插入图片描述

相关阅读:
yum 安装的 nginx 添加自定义模块后重新编译安装
 Zookeeper中的Watch机制的原理？
NTU 课程笔记：向量和矩阵
 pytorch（11）-- crnn 车牌端到端识别
 智能传感器有何不同？工业网关能用吗？
思腾云计算
 云原生下，中国联通如何建设数字化实时监控体系？
mysql限制用户登录失败次数,限制时间
 python回文日期并输出下一个ABABBABA型回文日期
 一文详解贝叶斯优化（Bayesian Optimization）原理
原文地址：https://blog.csdn.net/weixin_44026026/article/details/126024829