心态崩了，上一篇续写因为ctrlz重写了两遍，因为提交不上又重写了两遍。。。分了吧

选择或重新排序列

使用select()from dplyr选择要保留的列，并指定它们在数据框中的顺序。

注意：在以下示例中，linelist数据框已修改select()并显示，但未保存。这是出于演示目的。修改后的列名通过管道将数据框传输到names().

以下是清洁管道链中此时行列表中的所有列名：

names(linelist)
##  [1] "case_id"              "generation"           "date_infection"       "date_onset"           "date_hospitalisation" "date_outcome"        
##  [7] "outcome"              "gender"               "hospital"             "lon"                  "lat"                  "infector"            
## [13] "source"               "age"                  "age_unit"             "row_num"              "wt_kg"                "ht_cm"               
## [19] "ct_blood"             "fever"                "chills"               "cough"                "aches"                "vomit"               
## [25] "temp"                 "time_admission"       "merged_header"        "x28"

保留列

仅选择要保留的列，将他们的名字放在select()命令中，不带引号。它们将按照您提供的顺序出现在数据框中。请注意，如果您包含不存在的列，R 将返回错误（any_of()如果您不希望在这种情况下出现错误，请参阅下面的使用）。

# linelist dataset is piped through select() command, and names() prints just the column names
linelist %>% 
  select(case_id, date_onset, date_hospitalisation, fever) %>% 
  names()  # display the column names
## [1] "case_id"              "date_onset"           "date_hospitalisation" "fever"

“tidyselect”辅助函数

这些辅助函数的存在使指定要保留、丢弃或转换的列变得容易。它们来自 tidyselect 包，该包包含在tidyverse中，并且是dplyr函数中如何选择列的基础。

例如，如果你想对列重新排序，这everything()是一个有用的函数来表示“所有其他尚未提及的列”。下面的命令将列移动date_onset到date_hospitalisation数据集的开头（左侧），但之后保留所有其他列。注意everything()是用空括号写的：

# move date_onset and date_hospitalisation to beginning
linelist %>% 
  select(date_onset, date_hospitalisation, everything()) %>% 
  names()
##  [1] "date_onset"           "date_hospitalisation" "case_id"              "generation"           "date_infection"       "date_outcome"        
##  [7] "outcome"              "gender"               "hospital"             "lon"                  "lat"                  "infector"            
## [13] "source"               "age"                  "age_unit"             "row_num"              "wt_kg"                "ht_cm"               
## [19] "ct_blood"             "fever"                "chills"               "cough"                "aches"                "vomit"               
## [25] "temp"                 "time_admission"       "merged_header"        "x28"

以下是其他“tidyselect”辅助函数，它们也可以在 dplyr函数中工作，如select()、across()和summarise()：

everything()- 未提及的所有其他列
last_col()- 最后一列
where()- 将函数应用于所有列并选择为 TRUE 的列
contains()- 包含字符串的列
- 例子：select(contains("time"))
starts_with()- 匹配指定的前缀
- 例子：select(starts_with("date_"))
ends_with()- 匹配指定的后缀
- 例子：select(ends_with("_post"))
matches()- 应用正则表达式 (regex)
- 例子：select(matches("[pt]al"))
num_range()- 一个数字范围，如 x01、x02、x03
any_of()- 匹配 IF 列存在但如果未找到则不返回错误
- 例子：select(any_of(date_onset, date_death, cardiac_arrest))

此外，使用普通运算符，例如c()列出多列、:连续列、!相反、&AND 和|OR。

用于where()指定列的逻辑标准。如果在内部提供函数where()，请不要包含函数的空括号。下面的命令选择数字类的列。

# select columns that are class Numeric
linelist %>% 
  select(where(is.numeric)) %>% 
  names()
## [1] "generation" "lon"        "lat"        "row_num"    "wt_kg"      "ht_cm"      "ct_blood"   "temp"

用contains()仅选择列名包含指定字符串的列。ends_with()和starts_with()提供更多细微差别。

# select columns containing certain characters
linelist %>% 
  select(contains("date")) %>% 
  names()
## [1] "date_infection"       "date_onset"           "date_hospitalisation" "date_outcome"

该函数的matches()工作方式与正则表达式类似，contains()但可以提供正则表达式（参见Characters 和 strings页面），例如括号内由 OR 条分隔的多个字符串：

# searched for multiple character matches
linelist %>% 
  select(matches("onset|hosp|fev")) %>%   # note the OR symbol "|"
  names()
## [1] "date_onset"           "date_hospitalisation" "hospital"             "fever"

注意：如果数据中不存在您专门提供的列名，它可能会返回错误并停止您的代码。考虑使用any_of()来引用可能存在或不存在的列，这在否定（删除）选择中特别有用。

这些列中只有一个存在，但不会产生错误，并且代码会继续运行，而不会停止您的清理链。

linelist %>% 
  select(any_of(c("date_onset", "village_origin", "village_detection", "village_residence", "village_travel"))) %>% 
  names()
## [1] "date_onset"

删除列

通过在列名（例如 select(-outcome)）或列名向量（如下所示）前放置减号“-” 来指示要删除的列。所有其他列将被保留。

linelist %>% 
  select(-c(date_onset, fever:vomit)) %>% # remove date_onset and all columns from fever to vomit
  names()
##  [1] "case_id"              "generation"           "date_infection"       "date_hospitalisation" "date_outcome"         "outcome"             
##  [7] "gender"               "hospital"             "lon"                  "lat"                  "infector"             "source"              
## [13] "age"                  "age_unit"             "row_num"              "wt_kg"                "ht_cm"                "ct_blood"            
## [19] "temp"                 "time_admission"       "merged_header"        "x28"

您还可以使用基本R 语法删除列，方法是将其定义为NULL. 例如：

linelist$date_onset <- NULL   # deletes column with base R syntax

R-应用流行病学和公共卫生-7.数据清洗-2

选择或重新排序列

保留列

“tidyselect”辅助函数

删除列