【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）-CFANZ编程社区

前言

Hello！小伙伴！
非常感谢您阅读海轰的文章，倘若文中有错误的地方，欢迎您指出～

自我介绍 ଘ(੭ˊᵕˋ)੭
昵称：海轰
标签：程序猿｜C++选手｜学生
简介：因C语言结识编程，随后转入计算机专业，有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验：扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语！

初学Python 小白阶段
文章仅作为自己的学习笔记用于知识体系建立以及复习
题不在多学一题懂一题
知其然知其所以然！

Introduction

In these exercises we’ll apply groupwise analysis to our dataset.
Run the code cell below to load the data before running the exercises.

事先导入后面所需的数据集、库

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.grouping_and_sorting import *
print("Setup complete.")
reviews

本练习使用的数据集：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_机器学习

Exercises

1.

题目

Who are the most common wine reviewers in the dataset? Create a Series whose index is the taster_twitter_handle category from the dataset, and whose values count how many reviews each person wrote.

解答

题目意思：

创建一个Series，其索引是数据集中的taster_twitter_handle类别，其值计算每个人写了多少评论。
也就是先对taster_twitter_handle进行分组然后统计每一个组的size

reviews_written = reviews.groupby('taster_twitter_handle').size()

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_机器学习_02

其余参考Demo：

reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()

Note：

size作用与dataframe
count作用于seriers

2.

题目

What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).

解答

题目意思：

找出每个价格对应评分中最高的一个

best_rating_per_price = reviews.groupby('price').points.max()

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_机器学习_03

其余参考Demo：

best_rating_per_price = reviews.groupby('price')['points'].max().sort_index()
# best_rating_per_price = reviews.groupby('price')['points'].max() 这个也是正确的

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_人工智能_04

3.

题目

What are the minimum and maximum prices for each variety of wine? Create a DataFrame whose index is the variety category from the dataset and whose values are the min and max values thereof.

解答

题目意思：

统计出每一种酒类型（variety）对应的最高价格和最低价格

price_extremes = reviews.groupby('variety').price.agg([min,max])

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_python_05

4.

题目

What are the most expensive wine varieties? Create a variable sorted_varieties containing a copy of the dataframe from the previous question where varieties are sorted in descending order based on minimum price, then on maximum price (to break ties).

解答

题目意思：

统计出每一种酒（variety）对应的最高价格、最低价格，然后先按照最低价格进行降序排列，最低价格相同时，依据最高价格进行降序排列

sorted_varieties = price_extremes.sort_values(by=['min', 'max'], ascending=False)

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_人工智能_06

5.

题目

Create a Series whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the taster_name and points columns.

解答

题目意思：

统计每一个品酒师（taster_name）其所有评分（points）的平均值

reviewer_mean_ratings = reviews.groupby('taster_name').points.mean()

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_数据挖掘_07

6.

题目

What combination of countries and varieties are most common? Create a Series whose index is a MultiIndexof {country, variety} pairs. For example, a pinot noir produced in the US should map to {"US", "Pinot Noir"}. Sort the values in the Series in descending order based on wine count.

解答

题目意思：

统计每一个国家（country）所具有不同酒种类（variety）的数量按照降序排列（按照数量）

country_variety_counts = reviews.groupby(['country','variety']).size().sort_values(ascending=False)

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_数据挖掘_08

结语

文章仅作为学习笔记，记录从0到1的一个过程

希望对您有所帮助，如有错误欢迎小伙伴指正～

我是 海轰ଘ(੭ˊᵕˋ)੭

【Python｜Kaggle】机器学习系列之Pandas基础练习题（四）_数据挖掘_09