What is the difference between dataframe and matrix in R

What is the difference between dataframe and matrix in R


  R Interview Questions

Let’s create an employee table.

install.packages("randomNames")
require(randomNames)
# Get 100 random names
name = randomNames(100)
# Get 100 random ages
age = round(rnorm(100,mean = 30, sd = 10))

Now, let’s create a data frame with just 2 columns – name and age

employees = data.frame(names, age, stringsAsFactors=FALSE) 
> str(employees)
'data.frame': 100 obs. of  2 variables:
 $ names: chr  "Persons, Shelby" "Taylor, Chukwuma" "Jarvis, Destiny" "Rape, Zachery" ...
 $ age  : num  13 20 31 42 23 37 27 27 20 22 ...

You could do this because data.frame can contain columns of different types. In this case, names is a string and age is a number.

Can you do this with a matrix ? Of course not. Matrix can only contain one type of data.

Convert a dataframe to Matrix

If you try to convert this dataframe to a matrix, look at what happens.

> employees_m = data.matrix(employees)
Warning message:
In data.matrix(employees) : NAs introduced by coercion

What does the matrix contain ? The names (string) column was coerced to NAs.

> str(employees_m)
 num [1:100, 1:2] NA NA NA NA NA NA NA NA NA NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "names" "age"

As you can see, the data in the names column is gone.

> head(employees_m)
     names age
[1,]    NA  13
[2,]    NA  20
[3,]    NA  31
[4,]    NA  42
[5,]    NA  23
[6,]    NA  37

Here are the differences

1. Matrix is homogeneous but a data frame can be heterogeneous.

2. You can have factors in a data frame but not in a matrix

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: