0
点赞
收藏
分享

微信扫一扫

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)


前言

Hello!小伙伴!
非常感谢您阅读海轰的文章,倘若文中有错误的地方,欢迎您指出~
 
自我介绍 ଘ(੭ˊᵕˋ)੭
昵称:海轰
标签:程序猿|C++选手|学生
简介:因C语言结识编程,随后转入计算机专业,有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验:扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语!
 
初学Python 小白阶段
文章仅作为自己的学习笔记 用于知识体系建立以及复习
题不在多 学一题 懂一题
知其然 知其所以然!


Introduction

Run the following cell to load your data and some utility functions.

运行下面代码 导入练习所需的库、数据集…

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.renaming_and_combining import *
print("Setup complete.")

Exercises

View the first several lines of your data by running the cell below:

reviews.head()

使用到的数据:

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_python

1.

题目

​region_1​​​ and ​​region_2​​​ are pretty uninformative names for locale columns in the dataset. Create a copy of ​​reviews​​​ with these columns renamed to ​​region​​​ and ​​locale​​, respectively.

解答

题目意思:

修改​​region_1​​​、​​region_2​​​列为​​region​​​、​​locale​​ 其实就是修改一下列的名字

renamed = reviews.rename(columns={'region_1':'region','region_2':'locale'})

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_数据分析_02


其余参考Demo:

renamed = reviews.rename(columns=dict(region_1='region', region_2='locale'))

2.

题目

Set the index name in the dataset to ​​wines​​.

解答

题目意思:

对索引轴命名为 wines

reindexed = reviews.rename_axis('wines', axis='rows')

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_人工智能_03

3.

题目

The ​​Things on Reddit​​ dataset includes product links from a selection of top-ranked forums (“subreddits”) on reddit.com. Run the cell below to load a dataframe of products mentioned on the /r/gaming subreddit and another dataframe for products mentioned on the r//movies subreddit.

运行下面代码 导入该题需要的两个数据集

gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products['subreddit'] = "r/gaming"
movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products['subreddit'] = "r/movies"
gaming_products

Create a DataFrame of products mentioned on either subreddit.

解答

题目意思:

合并两个数据集

combined_products = pd.concat([gaming_products, movie_products])

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_python_04

4.

题目

The ​​Powerlifting Database​​ dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. Run the cell below to load these datasets into dataframes:

运行下面代码 导入该题需要的数据集

powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")
powerlifting_meets,powerlifting_competitors

第一个数据集如下(注意观察列数)

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_机器学习_05


第二个数据集如下(注意观察列数)

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_数据挖掘_06


Both tables include references to a MeetID, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.

解答

题目意思:

依据MeetID对两个数据集合 进行横向合并

powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_数据分析_07

结语

文章仅作为学习笔记,记录从0到1的一个过程

希望对您有所帮助,如有错误欢迎小伙伴指正~

我是 海轰ଘ(੭ˊᵕˋ)੭


【Python|Kaggle】机器学习系列之Pandas基础练习题(六)_数据分析_08


举报

相关推荐

0 条评论