【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）-CFANZ编程社区

前言

Hello！小伙伴！
非常感谢您阅读海轰的文章，倘若文中有错误的地方，欢迎您指出～

自我介绍 ଘ(੭ˊᵕˋ)੭
昵称：海轰
标签：程序猿｜C++选手｜学生
简介：因C语言结识编程，随后转入计算机专业，有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验：扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语！

初学Python 小白阶段
文章仅作为自己的学习笔记用于知识体系建立以及复习
题不在多学一题懂一题
知其然知其所以然！

Introduction

Run the following cell to load your data and some utility functions.

运行下面代码导入练习所需的库、数据集…

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.renaming_and_combining import *
print("Setup complete.")

Exercises

View the first several lines of your data by running the cell below:

reviews.head()

使用到的数据：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_python

1.

题目

region_1 and region_2 are pretty uninformative names for locale columns in the dataset. Create a copy of reviews with these columns renamed to region and locale, respectively.

解答

题目意思：

修改region_1、region_2列为region、locale 其实就是修改一下列的名字

renamed = reviews.rename(columns={'region_1':'region','region_2':'locale'})

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_数据分析_02

其余参考Demo：

renamed = reviews.rename(columns=dict(region_1='region', region_2='locale'))

2.

题目

Set the index name in the dataset to wines.

解答

题目意思：

对索引轴命名为 wines

reindexed = reviews.rename_axis('wines', axis='rows')

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_人工智能_03

3.

题目

The Things on Reddit dataset includes product links from a selection of top-ranked forums (“subreddits”) on reddit.com. Run the cell below to load a dataframe of products mentioned on the /r/gaming subreddit and another dataframe for products mentioned on the r//movies subreddit.

运行下面代码导入该题需要的两个数据集

gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products['subreddit'] = "r/gaming"
movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products['subreddit'] = "r/movies"
gaming_products

Create a DataFrame of products mentioned on either subreddit.

解答

题目意思：

合并两个数据集

combined_products = pd.concat([gaming_products, movie_products])

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_python_04

4.

题目

The Powerlifting Database dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. Run the cell below to load these datasets into dataframes:

运行下面代码导入该题需要的数据集

powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")
powerlifting_meets,powerlifting_competitors

第一个数据集如下（注意观察列数）

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_机器学习_05

第二个数据集如下（注意观察列数）

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_数据挖掘_06

Both tables include references to a MeetID, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.

解答

题目意思：

依据MeetID对两个数据集合进行横向合并

powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_数据分析_07

结语

文章仅作为学习笔记，记录从0到1的一个过程

希望对您有所帮助，如有错误欢迎小伙伴指正～

我是 海轰ଘ(੭ˊᵕˋ)੭

【Python｜Kaggle】机器学习系列之Pandas基础练习题（六）_数据分析_08