How to find out elements in one vector that are not in another vector

  R Interview Questions

Say you have 2 vectors,

> all_cities = c("hyderabad","delhi","mumbai","koklata","chennai")
> south_cities = c("hyderabad","chennai")

How do you find out the cities that are in the north ?

> north_cities = setdiff(all_cities,south_cities)
> north_cities
[1] "delhi"   "mumbai"  "koklata"

The setdiff() function is part of the set operations like union(), intersect() and setequal()

In this case, all_cities is a superset and south_cities is a subset. These could be disjoint sets as well. For example, think of all cities with

  • a population > 10M
  • a metro rail transport
> cities_10m = c("hyderabad","delhi","mumbai","koklata","chennai")
> cities_metro = c("hyderabad","delhi","chennai","bangalore")

# cities with metro but do not have a population of 10 M
> setdiff(cities_metro,cities_10m)
[1] "bangalore
# cities that have a population of 10M without a metro
> setdiff(all_cities,cities_metro)
[1] "mumbai"  "koklata"

Now, you should be able to understand why setdiff() is a asymmetrical function

# cities that have a population of 10M with a Metro
> intersect(all_cities,cities_metro)
[1] "hyderabad" "delhi"     "chennai"

# cities with either a population of 10M or with a Metro
> union(cities_metro,all_cities)
[1] "hyderabad" "delhi"     "chennai"  
[4] "bangalore" "mumbai"    "koklata"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.