What is the difference between sapply vs for loop


  R Interview Questions

Let’s make a small data set from iris.

# Get 9 rows ( sampled across all species ) and lose the species column. 
> iris_small = iris[c(1:3,51:53,101:103),-5]
> iris_small
    Sepal.Length Sepal.Width Petal.Length Petal.Width
1            5.1         3.5          1.4         0.2
2            4.9         3.0          1.4         0.2
3            4.7         3.2          1.3         0.2
51           7.0         3.2          4.7         1.4
52           6.4         3.2          4.5         1.5
53           6.9         3.1          4.9         1.5
101          6.3         3.3          6.0         2.5
102          5.8         2.7          5.1         1.9
103          7.1         3.0          5.9         2.1

Say, we want to get the mean across all columns.

1. You can go the apply () route

> apply(iris_small, 2, mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    6.022222     3.133333     3.911111     1.277778

The second parameter to the apply () function dictates whether the operation needs to be done row-wise or column-wise. For example, in this case, we wanted mean column-wise and that’s why we sent 2 as the parameter. If we wanted mean across rows, use 1.

2. or use the loop ( say for loop )

for(col in colnames(iris_small)){
  m = mean(iris_small[[col]])
  print(m)
}
[1] 6.022222
[1] 3.133333
[1] 3.911111
[1] 1.277778

Which is faster – apply () or for loop ?

apply() function is essentially for loop under the hood. So, neither are faster than each other. However, the advantage of apply() function is readability. for loop comes with some paraphernalia –

  • Specify the iterative methods
  • Flower brackets for open and close etc

which can make things a little less readable.

Limitations on apply ()

apply() function is only applicable to matrices or arrays – not to data frames. If there is a string for example, all the rest of the numeric elements are coerced to strings. This might be a problem in data frames when you want to perform row-wise operations ( which are rare anyway ). However, when you are doing column-wise operations, if all of the elements are numeric, then you are good to go.

Summary

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.