Why are box plots used in R
If you already know how to create box plots in R, let’s dig into the “why” of box plots.
- Visually represent the following attributes.
- minimum
- maximum
- median
- quartiles
- outliers
# box plot for the "sepal length" of the setosa species
> b = boxplot(Sepal.Length ~ Species, data = iris[iris$Species=="setosa",])
> b$stats
[,1] [,2] [,3]
[1,] 4.3 NA NA
[2,] 4.8 NA NA
[3,] 5.0 NA NA
[4,] 5.2 NA NA
[5,] 5.8 NA NA

- Visually compare distributions
> b = boxplot(Sepal.Length ~ Species, data = iris)
> b$stats
[,1] [,2] [,3]
[1,] 4.3 4.9 5.6
[2,] 4.8 5.6 6.2
[3,] 5.0 5.9 6.5
[4,] 5.2 6.3 6.9
[5,] 5.8 7.0 7.9

You can visually draw quick conclusions on the ‘data spread’ across data points – in this case species. For example, you can understand that
- sepal length are markedly distinct across the species. That is good news for classification.
- sepal length of setosa falls in a markedly different bucket. There is some overlap across versicolor and virginica species.