Preparation

install.library('tidyverse')

The package tidyverse includes several useful packages using in data analysis, such as ggplot2, phlyr, tidyr. The phlyr is selected to perform the data in this article.

Work Flow

# load the tidyverse package
library(tidyverse)

filter——过滤

# filter(.data, ..., .preserve = FALSE)
# using the iris data
> data(iris)
# display the first five rows of the iris data
> head(iris)
# filter the data and attain the Sepal.Length = 5
> filter(iris, Sepal.Length == 5)

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1             5         3.6          1.4         0.2     setosa
2             5         3.4          1.5         0.2     setosa
3             5         3.0          1.6         0.2     setosa
4             5         3.4          1.6         0.4     setosa
5             5         3.2          1.2         0.2     setosa
6             5         3.5          1.3         0.3     setosa
7             5         3.5          1.6         0.6     setosa
8             5         3.3          1.4         0.2     setosa
9             5         2.0          3.5         1.0 versicolor
10            5         2.3          3.3         1.0 versicolor

> filter(iris, Sepal.Length == 5 & Sepal.Width == 3)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1            5           3          1.6         0.2  setosa

Useful filter functions

There are many functions and operators that are useful when constructing the expressions used to filter the data:

==, >, >= etc
&, |, !, xor()
is.na()
between(), near()

Attention:
The filter() will exclude the data contain NA , or you can keep the NA by adding restrictions.

> flower <- iris
> flower[1,1] <- NA
> filter(flower, is.na(flower) | Sepal.Length == 5 )
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            NA         3.5          1.4         0.2     setosa
2             5         3.6          1.4         0.2     setosa
3             5         3.4          1.5         0.2     setosa
4             5         3.0          1.6         0.2     setosa
5             5         3.4          1.6         0.4     setosa
6             5         3.2          1.2         0.2     setosa
7             5         3.5          1.3         0.3     setosa
8             5         3.5          1.6         0.6     setosa
9             5         3.3          1.4         0.2     setosa
10            5         2.0          3.5         1.0 versicolor
11            5         2.3          3.3         1.0 versicolor

arrange——排序

# arrange the Sepal.Width column and then the Species column
> arrange(iris, Petal.Width, Species)
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            4.9         3.1          1.5         0.1     setosa
2            4.8         3.0          1.4         0.1     setosa
3            4.3         3.0          1.1         0.1     setosa
4            5.2         4.1          1.5         0.1     setosa
5            4.9         3.6          1.4         0.1     setosa
...
47           5.4         3.4          1.5         0.4     setosa
48           5.1         3.8          1.9         0.4     setosa
49           5.1         3.3          1.7         0.5     setosa
50           5.0         3.5          1.6         0.6     setosa
51           4.9         2.4          3.3         1.0 versicolor
52           5.0         2.0          3.5         1.0 versicolor
53           6.0         2.2          4.0         1.0 versicolor
...
# The optional parameters desc() can be used to descend order.

select()——选择

# select the Petal.Width column and Species column
> select(iris, Petal.Width, Species)
# select the data from Petal.Width column to Species column
> select(iris, Petal.Width:Species)
# select the data except Petal.Width column to Species column
> select(iris, -c(Petal.Width:Species))

Useful selection skills

Overview of selection features
Tidyverse selections implement a dialect of R where operators make it easy to select variables:

: for selecting a range of consecutive variables.
! for taking the complement of a set of variables.
& and | for selecting the intersection or the union of two sets of variables.
c() for combining selections.

In addition, you can use selection helpers. Some helpers select specific columns:

everything(): Matches all variables.
last_col(): Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

starts_with(): Starts with a prefix.
ends_with(): Ends with a suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

This helper selects variables with a function:

where(): Applies a function to all variables and selects those for which the function returns TRUE.

mutate()——创建新变量

iris_part <- mutate(iris, Sepal.Area = Sepal.Length * Sepal.Width)

Attention: If you only want to preserve the new variables, you can use the transmute() function.

Reference

https://dplyr.tidyverse.org/

让R像excel一样工作-篇一