【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）-CFANZ编程社区

前言

Hello！小伙伴！
非常感谢您阅读海轰的文章，倘若文中有错误的地方，欢迎您指出～

自我介绍 ଘ(੭ˊᵕˋ)੭
昵称：海轰
标签：程序猿｜C++选手｜学生
简介：因C语言结识编程，随后转入计算机专业，有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验：扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语！

初学Python 小白阶段
文章仅作为自己的学习笔记用于知识体系建立以及复习
题不在多学一题懂一题
知其然知其所以然！

Introduction

In this set of exercises we will work with the Wine Reviews dataset.

运行代码代码
导入本次练习的数据集以及相应的包

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.indexing_selecting_and_assigning import *
print("Setup complete.")

Look at an overview of your data by running the following line.

运行代码查看导入的数据

reviews.head()

练习使用的数据如下：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_python

Exercises

1.

题目

Select the description column from reviews and assign the result to the variable desc.

解答

题目要求：

#review为使用的数据集开始时已经导入
单独提取出description这一列赋值给desc

desc = reviews.description

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_python_02

其余解答：

desc = reviews["description"]

2.

题目

Select the first value from the description column of reviews, assigning it to variable first_description.

解答

题目要求：

提取 description 列中的第一个值

first_description = reviews.description.iloc[0]

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_数据_03

3.

题目

Select **the first row of data **(the first record) from reviews, assigning it to the variable first_row.

解答

题目要求：

提取数据review的第一行

first_row = reviews.iloc[0]

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_赋值_04

4.

题目

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions.

Hint: format your output as a pandas Series.

解答

题目要求：

description 列的前十个元素

first_descriptions = reviews.description.iloc[:10]

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_python_05

其余解答：

first_descriptions = reviews.description.head(10)

first_descriptions = reviews.loc[:9, "description"]

5.

题目

Select the records with index labels 1, 2, 3, 5, and 8, assigning the result to the variable sample_reviews.

In other words, generate the following DataFrame:

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_python_06

解答

题目要求：

提取行标（索引值）为1、2 、3、5、8 的数据并赋值给sample_reviews

indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_python_07

6.

题目

Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100. In other words, generate the following DataFrame:

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_赋值_08

解答

题目要求：

提取列为country, province, region_1, and region_2 且行标为0 、1、10、100的数据
并赋值给df

cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_数据分析_09

7.

题目

Create a variable df containing the country and variety columns of the first 100 records.

Hint: you may use loc or iloc. When working on the answer this question and the several of the ones that follow, keep the following “gotcha” described in the tutorial:

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded.
loc, meanwhile, indexes inclusively.

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000. In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.iloc[0:999].

解答

题目要求：

提取列为country and variety 的前100行数据并赋值给df

cols = ['country','variety']
df = reviews.loc[0:99,cols]

其余解答：

cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]

8.

题目

Create a DataFrame italian_wines containing reviews of wines made in Italy.

Hint: reviews.country equals what?

解答

题目要求：

提取出列country==‘Italy’ 的所有记录

italian_wines =  reviews[reviews.country == 'Italy']

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_数据分析_10

9.

题目

Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.

解答

题目要求：

提取出列country为Australia或者New Zealand 且 points分数大于等于95 的所有记录

top_oceania_wines = reviews.loc[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)
]

运行结果：

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_机器学习_11

结语

文章仅作为学习笔记，记录从0到1的一个过程

希望对您有所帮助，如有错误欢迎小伙伴指正～

我是 海轰ଘ(੭ˊᵕˋ)੭

【Python｜Kaggle】机器学习系列之Pandas基础练习题（二）_机器学习_12