What is the difference between dataframe and matrix in R
Let’s create an employee table.
install.packages("randomNames") require(randomNames) # Get 100 random names name = randomNames(100)
# Get 100 random ages age = round(rnorm(100,mean = 30, sd = 10))
Now, let’s create a data frame with just 2 columns – name and age
employees = data.frame(names, age, stringsAsFactors=FALSE) > str(employees) 'data.frame': 100 obs. of 2 variables: $ names: chr "Persons, Shelby" "Taylor, Chukwuma" "Jarvis, Destiny" "Rape, Zachery" ... $ age : num 13 20 31 42 23 37 27 27 20 22 ...
You could do this because data.frame can contain columns of different types. In this case, names is a string and age is a number.
Can you do this with a matrix ? Of course not. Matrix can only contain one type of data.
Convert a dataframe to Matrix
If you try to convert this dataframe to a matrix, look at what happens.
> employees_m = data.matrix(employees) Warning message: In data.matrix(employees) : NAs introduced by coercion
What does the matrix contain ? The names (string) column was coerced to NAs.
> str(employees_m) num [1:100, 1:2] NA NA NA NA NA NA NA NA NA NA ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "names" "age"
As you can see, the data in the names column is gone.
> head(employees_m) names age [1,] NA 13 [2,] NA 20 [3,] NA 31 [4,] NA 42 [5,] NA 23 [6,] NA 37
Here are the differences
1. Matrix is homogeneous but a data frame can be heterogeneous.
2. You can have factors in a data frame but not in a matrix