What is the difference between sapply vs for loop
Let’s make a small data set from iris.
# Get 9 rows ( sampled across all species ) and lose the species column.
> iris_small = iris[c(1:3,51:53,101:103),-5]
> iris_small
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
51 7.0 3.2 4.7 1.4
52 6.4 3.2 4.5 1.5
53 6.9 3.1 4.9 1.5
101 6.3 3.3 6.0 2.5
102 5.8 2.7 5.1 1.9
103 7.1 3.0 5.9 2.1
Say, we want to get the mean across all columns.
1. You can go the apply () route
> apply(iris_small, 2, mean)
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.022222 3.133333 3.911111 1.277778
The second parameter to the apply () function dictates whether the operation needs to be done row-wise or column-wise. For example, in this case, we wanted mean column-wise and that’s why we sent 2 as the parameter. If we wanted mean across rows, use 1.

2. or use the loop ( say for loop )
for(col in colnames(iris_small)){
m = mean(iris_small[[col]])
print(m)
}
[1] 6.022222
[1] 3.133333
[1] 3.911111
[1] 1.277778
Which is faster – apply () or for loop ?
apply() function is essentially for loop under the hood. So, neither are faster than each other. However, the advantage of apply() function is readability. for loop comes with some paraphernalia –
- Specify the iterative methods
- Flower brackets for open and close etc
which can make things a little less readable.
Limitations on apply ()
apply() function is only applicable to matrices or arrays – not to data frames. If there is a string for example, all the rest of the numeric elements are coerced to strings. This might be a problem in data frames when you want to perform row-wise operations ( which are rare anyway ). However, when you are doing column-wise operations, if all of the elements are numeric, then you are good to go.
Summary
