Chapter 15 The apply()
family
R
disposes of the apply()
function family, which consists of iterative functions that aim at minimizing your need to explicitly create loops.
15.1 apply()
Let us consider that we have a height
matrix containing the height (in metres) that was taken from five individuals (in rows) at four different times (as columns).
<- matrix(runif(20, 1.5, 2), nrow = 5, ncol = 4)) (height
## [,1] [,2]
## [1,] 1.544362 1.551728
## [2,] 1.568389 1.941105
## [3,] 1.874391 1.666640
## [4,] 1.624035 1.583698
## [5,] 1.815484 1.503837
## [,3] [,4]
## [1,] 1.638139 1.657616
## [2,] 1.620002 1.565286
## [3,] 1.801649 1.661404
## [4,] 1.919128 1.756625
## [5,] 1.799496 1.798731
We would like to obtain the average height at each time step.
One option is to use a for() {}
loop to iterate from column 1
to 4
, use the function mean()
to calculate the average of the values, and sequentially store the output value in a vector.
Alternatively, we can use the apply()
function to set it to apply the mean()
function to every column of the height
matrix. See the example below:
apply(X = height, MARGIN = 2, FUN = mean)
## [1] 1.685332 1.649402 1.755683
## [4] 1.687933
The
apply()
function begins with three arguments main arguments:X
, which will take a matrix or a data frame;FUN
, which can be any function that will be applied to theMARGIN
s ofX
; andMARGIN
which will take1
for row-wise computations, or2
for column-wise computations.
15.2 lapply()
lapply()
applies a function to every element of a list
.
The output returned is also list
(explaining the “l
” in lapply
) and has the same number of elements as the object passed to it.
<- list(SimpleSequence = 1:4, Norm10 = rnorm(10),
SimulatedData Norm20 = rnorm(20, 1), Norm100 = rnorm(100, 5))
# Apply mean to each element of the list
lapply(X = SimulatedData, FUN = mean)
## $SimpleSequence
## [1] 2.5
##
## $Norm10
## [1] 0.1959076
##
## $Norm20
## [1] 1.051744
##
## $Norm100
## [1] 5.00134
lapply()
operations done in objects different from alist
will be coerced to alist
viabase::as.list()
.
15.3 sapply()
sapply()
is a ‘wrapper’ function for lapply()
, but returns a simplified output as a vector
, instead of a list
.
<- list(SimpleSequence = 1:4, Norm10 = rnorm(10),
SimulatedData Norm20 = rnorm(20, 1), Norm100 = rnorm(100, 5))
# Apply mean to each element of the list
sapply(SimulatedData, mean)
## SimpleSequence Norm10
## 2.5000000 0.2365739
## Norm20 Norm100
## 1.1343476 4.9551218
15.4 mapply()
mapply()
works as a multivariate version of sapply()
.
It will apply a given function to the first element of each argument first, followed by the second element, and so on. For example:
<- c(80, 65, 89, 23, 21)
lilySeeds <- c(20, 35, 11, 77, 79)
poppySeeds
# Output
mapply(sum, lilySeeds, poppySeeds)
## [1] 100 100 100 100 100
15.5 tapply()
tapply()
is used to apply a function over subsets of a vector.
It is primarily used when the dataset contains dataset contains different groups (i.e. levels/factors) and we want to apply a function to each of these groups.
head(mtcars)
## mpg cyl
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Datsun 710 22.8 4
## Hornet 4 Drive 21.4 6
## Hornet Sportabout 18.7 8
## Valiant 18.1 6
## disp hp
## Mazda RX4 160 110
## Mazda RX4 Wag 160 110
## Datsun 710 108 93
## Hornet 4 Drive 258 110
## Hornet Sportabout 360 175
## Valiant 225 105
## drat wt
## Mazda RX4 3.90 2.620
## Mazda RX4 Wag 3.90 2.875
## Datsun 710 3.85 2.320
## Hornet 4 Drive 3.08 3.215
## Hornet Sportabout 3.15 3.440
## Valiant 2.76 3.460
## qsec vs am
## Mazda RX4 16.46 0 1
## Mazda RX4 Wag 17.02 0 1
## Datsun 710 18.61 1 1
## Hornet 4 Drive 19.44 1 0
## Hornet Sportabout 17.02 0 0
## Valiant 20.22 1 0
## gear carb
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1
# get the mean hp by cylinder groups
tapply(mtcars$hp, mtcars$cyl, FUN = mean)
## 4 6 8
## 82.63636 122.28571 209.21429