Why are box plots used in R


  R Interview Questions

If you already know how to create box plots in R, let’s dig into the “why” of box plots.

  • Visually represent the following attributes.
    • minimum
    • maximum
    • median
    • quartiles
    • outliers
# box plot for the "sepal length" of the setosa species
> b = boxplot(Sepal.Length ~ Species, data = iris[iris$Species=="setosa",])
> b$stats
     [,1] [,2] [,3]
[1,]  4.3   NA   NA
[2,]  4.8   NA   NA
[3,]  5.0   NA   NA
[4,]  5.2   NA   NA
[5,]  5.8   NA   NA

  • Visually compare distributions
> b = boxplot(Sepal.Length ~ Species, data = iris)
> b$stats
     [,1] [,2] [,3]
[1,]  4.3  4.9  5.6
[2,]  4.8  5.6  6.2
[3,]  5.0  5.9  6.5
[4,]  5.2  6.3  6.9
[5,]  5.8  7.0  7.9

You can visually draw quick conclusions on the ‘data spread’ across data points – in this case species. For example, you can understand that

  • sepal length are markedly distinct across the species. That is good news for classification.
  • sepal length of setosa falls in a markedly different bucket. There is some overlap across versicolor and virginica species.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.