What is the difference between dataframe and matrix in R
Let’s create an employee table.
install.packages("randomNames")
require(randomNames)
# Get 100 random names
name = randomNames(100)
# Get 100 random ages
age = round(rnorm(100,mean = 30, sd = 10))
Now, let’s create a data frame with just 2 columns – name and age
employees = data.frame(names, age, stringsAsFactors=FALSE)
> str(employees)
'data.frame': 100 obs. of 2 variables:
$ names: chr "Persons, Shelby" "Taylor, Chukwuma" "Jarvis, Destiny" "Rape, Zachery" ...
$ age : num 13 20 31 42 23 37 27 27 20 22 ...
You could do this because data.frame can contain columns of different types. In this case, names is a string and age is a number.
Can you do this with a matrix ? Of course not. Matrix can only contain one type of data.
Convert a dataframe to Matrix
If you try to convert this dataframe to a matrix, look at what happens.
> employees_m = data.matrix(employees)
Warning message:
In data.matrix(employees) : NAs introduced by coercion
What does the matrix contain ? The names (string) column was coerced to NAs.
> str(employees_m)
num [1:100, 1:2] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "names" "age"
As you can see, the data in the names column is gone.
> head(employees_m)
names age
[1,] NA 13
[2,] NA 20
[3,] NA 31
[4,] NA 42
[5,] NA 23
[6,] NA 37
Here are the differences
1. Matrix is homogeneous but a data frame can be heterogeneous.
2. You can have factors in a data frame but not in a matrix