How to remove duplicate rows from a data frame in R


  R Interview Questions

Click here for more R Interview Questions

1 – DUPLICATED ()

The last 3 rows are duplicates of rows 4, 5 and 6.

> iris_dup
    Sepal.Length Sepal.Width Petal.Length petal.Width    species
1            5.1         3.5          1.4         0.2     Setosa
2            4.9         3.0          1.4         0.2     Setosa
3            4.7         3.2          1.3         0.2     Setosa
51           7.0         3.2          4.7         1.4  Virginica
52           6.4         3.2          4.5         1.5  Virginica
53           6.9         3.1          4.9         1.5  Virginica
101          6.3         3.3          6.0         2.5 Versicolor
102          5.8         2.7          5.1         1.9 Versicolor
103          7.1         3.0          5.9         2.1 Versicolor
511          7.0         3.2          4.7         1.4  Virginica
521          6.4         3.2          4.5         1.5  Virginica
531          6.9         3.1          4.9         1.5  Virginica

Step 1 – Find out the duplicates

duplicated(iris_dup)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

Step 2 – Use this vector to select only the duplicate rows

> iris_dup[duplicated(iris_dup),]
    Sepal.Length Sepal.Width Petal.Length petal.Width   species
511          7.0         3.2          4.7         1.4 Virginica
521          6.4         3.2          4.5         1.5 Virginica
531          6.9         3.1          4.9         1.5 Virginica

or have only the non-duplicate rows.

> iris_dup[!duplicated(iris_dup),]
    Sepal.Length Sepal.Width Petal.Length petal.Width    species
1            5.1         3.5          1.4         0.2     Setosa
2            4.9         3.0          1.4         0.2     Setosa
3            4.7         3.2          1.3         0.2     Setosa
51           7.0         3.2          4.7         1.4  Virginica
52           6.4         3.2          4.5         1.5  Virginica
53           6.9         3.1          4.9         1.5  Virginica
101          6.3         3.3          6.0         2.5 Versicolor
102          5.8         2.7          5.1         1.9 Versicolor
103          7.1         3.0          5.9         2.1 Versicolor

2. UNIQUE () function

This is much more straightforward.

> unique(iris_dup)
    Sepal.Length Sepal.Width Petal.Length petal.Width    species
1            5.1         3.5          1.4         0.2     Setosa
2            4.9         3.0          1.4         0.2     Setosa
3            4.7         3.2          1.3         0.2     Setosa
51           7.0         3.2          4.7         1.4  Virginica
52           6.4         3.2          4.5         1.5  Virginica
53           6.9         3.1          4.9         1.5  Virginica
101          6.3         3.3          6.0         2.5 Versicolor
102          5.8         2.7          5.1         1.9 Versicolor
103          7.1         3.0          5.9         2.1 Versicolor

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.