0
点赞
收藏
分享

微信扫一扫

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)


前言

Hello!小伙伴!
非常感谢您阅读海轰的文章,倘若文中有错误的地方,欢迎您指出~
 
自我介绍 ଘ(੭ˊᵕˋ)੭
昵称:海轰
标签:程序猿|C++选手|学生
简介:因C语言结识编程,随后转入计算机专业,有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验:扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语!
 
初学Python 小白阶段
文章仅作为自己的学习笔记 用于知识体系建立以及复习
题不在多 学一题 懂一题
知其然 知其所以然!

Introduction

In this set of exercises we will work with the ​​Wine Reviews dataset​​.

运行代码代码
导入本次练习的数据集以及相应的包

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.indexing_selecting_and_assigning import *
print("Setup complete.")

Look at an overview of your data by running the following line.

运行代码 查看导入的数据

reviews.head()

练习使用的数据如下:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_python

Exercises

1.

题目

Select the description column from reviews and assign the result to the variable desc.

解答

题目要求:

#review为使用的数据集 开始时已经导入
单独提取出description这一列 赋值给desc

desc = reviews.description

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_python_02


其余解答:

desc = reviews["description"]

2.

题目

Select the first value from the description column of reviews, assigning it to variable first_description.

解答

题目要求:

提取 description 列中的第一个值

first_description = reviews.description.iloc[0]

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_数据_03

3.

题目

Select **the first row of data **(the first record) from reviews, assigning it to the variable first_row.

解答

题目要求:

提取数据review的第一行

first_row = reviews.iloc[0]

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_赋值_04

4.

题目

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions.

Hint: format your output as a pandas Series.

解答

题目要求:

description 列的前十个元素

first_descriptions = reviews.description.iloc[:10]

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_python_05


其余解答:

first_descriptions = reviews.description.head(10)

first_descriptions = reviews.loc[:9, "description"]

5.

题目

Select the records with index labels 1, 2, 3, 5, and 8, assigning the result to the variable sample_reviews.

In other words, generate the following DataFrame:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_python_06

解答

题目要求:

提取行标(索引值) 为1、2 、3、5、8 的数据 并赋值给sample_reviews

indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_python_07

6.

题目

Create a variable ​​df​​​ containing the ​​country​​​, ​​province​​​, ​​region_1​​​, and ​​region_2​​​ columns of the records with the index labels ​​0​​​, ​​1​​​, ​​10​​​, and ​​100​​. In other words, generate the following DataFrame:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_赋值_08

解答

题目要求:

提取 列为​​country​​​, ​​province​​​, ​​region_1​​​, and ​​region_2​​​ 且 行标为0 、1、10、100的数据
并赋值给df

cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_数据分析_09

7.

题目

Create a variable ​​df​​​ containing the ​​country​​​ and ​​variety​​ columns of the first 100 records.

Hint: you may use ​​loc​​​ or ​​iloc​​. When working on the answer this question and the several of the ones that follow, keep the following “gotcha” described in the tutorial:

​iloc​​​ uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded.
​​​loc​​, meanwhile, indexes inclusively.

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. ​​0,...,1000​​​. In this case ​​df.iloc[0:1000]​​​ will return 1000 entries, while ​​df.loc[0:1000]​​​ return 1001 of them! To get 1000 elements using ​​loc​​​, you will need to go one lower and ask for ​​df.iloc[0:999]​​.

解答

题目要求:

提取列为​​country​​​ and ​​variety​​ 的前100行数据 并赋值给df

cols = ['country','variety']
df = reviews.loc[0:99,cols]

其余解答:

cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]

8.

题目

Create a DataFrame ​​italian_wines​​​ containing reviews of wines made in ​​Italy​​.

Hint: ​​reviews.country​​ equals what?

解答

题目要求:

提取出 列country==‘Italy’ 的所有记录

italian_wines =  reviews[reviews.country == 'Italy']

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_数据分析_10

9.

题目

Create a DataFrame ​​top_oceania_wines​​ containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.

解答

题目要求:

提取出 列country为Australia或者New Zealand 且 points分数大于等于95 的所有记录

top_oceania_wines = reviews.loc[
(reviews.country.isin(['Australia', 'New Zealand']))
& (reviews.points >= 95)
]

运行结果:

【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_机器学习_11

结语

文章仅作为学习笔记,记录从0到1的一个过程

希望对您有所帮助,如有错误欢迎小伙伴指正~

我是 海轰ଘ(੭ˊᵕˋ)੭


【Python|Kaggle】机器学习系列之Pandas基础练习题(二)_机器学习_12


举报

相关推荐

Python经典练习题(二)

0 条评论