How to visualize multi-dimensional data in R

How to visualize multi-dimensional data in R


  R Interview Questions

Multi-dimensional data cannot be visualized easily. Here are some methods.

2D Scatter plots

Scatter plots typically are 2 dimensional. You can use the following methods to include more variables into the plot.

Color

Color can be used to map the third variable in a scatter plot.This is typically used when the third variable is categorical.

> attach(iris)
> plot = plot(Sepal.Length,Sepal.Width,col=Species)

Shape

Shape can also be used to show the third variable. Similar to color, using shape for the 3rd variable makes sense when it is categorical. You can also use a combination of color and shape like below.

plot = plot(Sepal.Length,Sepal.Width,type="p", 
            pch = c(16, 17, 18)[as.numeric(Species)],
            col = c("red", "green","blue")[as.numeric(Species)])

Size

If the 3rd parameter is continuous, you can use the size to show big vs small values.

> plot = plot(Sepal.Length,Sepal.Width,type="p", 
            cex = Petal.Length, 
            bg = Species, # Background color    
            pch = 21)
> legend("topleft",legend = unique(Species), 
       col = c("black","red","green"),
       lty = 1:2)

The parameter cex controls the scaling of the dots. In fact, we were able to plot 4 parameters here

  • Sepal.Length ( x- axis )
  • Sepal.Width ( y-axis )
  • Species ( with color )
  • Petal.Length ( with size )

One conclusion here could be, the Petal Length is greater for Virginca in general than Setosa species.

3D Scatter plots

3 dimensional plots takes this to the next level. You can use the “z” dimension to map a third variable.

col = c("#FF0000","#00FF00","#0000FF")
col = col[as.numeric(Species)]
scatterplot3d(Sepal.Length,Sepal.Width,Petal.Length,
              bg=Species,
              color = col)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: