x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
y <- c(101, 102, 113, 201, 202, 344, 407)
df = data.frame(x, y)
x y
1 aa 101
2 aa 102
3 aa 113
4 bb 201
5 cc 202
6 cc 344
7 cc 407
#filter out rows that contain 'Guard' in the player column
df %>% filter(!grepl('Guard', player))
player points rebounds
1 S Forward 19 7
2 P Forward 22 12
3 Center 32 11
Pipe functions together
We created multiple new data objects during our explorations of dplyr functions, above. While this works, we can produce the same results more efficiently by chaining functions together and creating only one new data object that encapsulates all of the previously sought information: filter on only females, grepl to get only Peromyscus spp., group_by individual species, and summarise the numbers of individuals.
# combine several functions to get a summary of the numbers of individuals of
# female Peromyscus species in our dataset.
# remember %>% are "pipes" that allow us to pass information from one function
# to the next.
dataBySpFem <- myData %>%
filter(grepl('Peromyscus', scientificName), sex == "F") %>%
group_by(scientificName) %>%
summarise(n_individuals = n())
## `summarise()` ungrouping output (override with `.groups` argument)
# view the data
dataBySpFem
## # A tibble: 3 x 2
## scientificName n_individuals
## <chr> <int>
## 1 Peromyscus leucopus 455
## 2 Peromyscus maniculatus 98
## 3 Peromyscus sp. 5
Cool!
Base R only
So that is nice, but we had to install a new package dplyr. You might ask, "Is it really worth it to learn new commands if I can do this is base R." While we think "yes", why don't you see for yourself. Here is the base R code needed to accomplish the same task.
# For reference, the same output but using R's base functions
# First, subset the data to only female Peromyscus
dataFemPero <- myData[myData$sex == 'F' &
grepl('Peromyscus', myData$scientificName), ]
# Option 1 --------------------------------
# Use aggregate and then rename columns
dataBySpFem_agg <-aggregate(dataFemPero$sex ~ dataFemPero$scientificName,
data = dataFemPero, FUN = length)
names(dataBySpFem_agg) <- c('scientificName', 'n_individuals')
# view output
dataBySpFem_agg
## scientificName n_individuals
## 1 Peromyscus leucopus 455
## 2 Peromyscus maniculatus 98
## 3 Peromyscus sp. 5
# Option 2 --------------------------------------------------------
# Do it by hand
# Get the unique scientificNames in the subset
sppInDF <- unique(dataFemPero$scientificName[!is.na(dataFemPero$scientificName)])
# Use a loop to calculate the numbers of individuals of each species
sciName <- vector(); numInd <- vector()
for (i in sppInDF) {
sciName <- c(sciName, i)
numInd <- c(numInd, length(which(dataFemPero$scientificName==i)))
}
#Create the desired output data frame
dataBySpFem_byHand <- data.frame('scientificName'=sciName,
'n_individuals'=numInd)
# view output
dataBySpFem_byHand
## scientificName n_individuals
## 1 Peromyscus leucopus 455
## 2 Peromyscus maniculatus 98
## 3 Peromyscus sp. 5