# What strategies did you use to eliminate NA

### Substituting with feature average

Step 1 – Get the row mean

```> s_data
s_1 s_2 s_3
1  103 102 113
2  101 108 122
3   98 106  88
4  101 102 106
5   98  NA 101
6  108 103  95
7  100  99 106
8   94  80 107
9   97 104  99
10  93  96  97
```
```> mean(s_data\$s_2,na.rm = TRUE)
[1] 100
```

Step 2 – Replace the NA with row mean

```> s_data[5,"s_2"] = mean(s_data\$s_2,na.rm = TRUE)
```
```> s_data
s_1 s_2 s_3
1  103 102 113
2  101 108 122
3   98 106  88
4  101 102 106
5   98 100 101
6  108 103  95
7  100  99 106
8   94  80 107
9   97 104  99
10  93  96  97
```

If you wanted to do this pro grammatically ( for any number of rows,columns ), do this

```replaceNA = function(data) {
# Loop through each column
for ( var in 1:ncol(data)){
# Get the mean odf the column.
mean = mean ( data[,var], na.rm = TRUE)
# Replace the NA with the mean
# is.na ( data[,var]) - Gets the T/F vector where value is NA
data[ is.na ( data[,var]),var ] = mean
}
# return the dataset
data
}
```
```> replaceNA(s_data)
s_1 s_2 s_3
1  103 102 113
2  101 108 122
3   98 106  88
4  101 102 106
5   98 100 101
6  108 103  95
7  100  99 106
8   94  80 107
9   97 104  99
10  93  96  97
```

### Eliminate the entire observation

If the number of observations with NAs is quite small, the easy way is to get rid of the entire observation.

For example, getting rid of the 6th row above with NA is very easy. Just use the function omit.na()

```> s_data
s_1 s_2 s_3
1  103 102 113
2  101 108 122
3   98 106  88
4  101 102 106
5   98  NA 101
6  108 103  95
7  100  99 106
8   94  80 107
9   97 104  99
10  93  96  97
```
```> na.omit(s_data)
s_1 s_2 s_3
1  103 102 113
2  101 108 122
3   98 106  88
4  101 102 106
6  108 103  95
7  100  99 106
8   94  80 107
9   97 104  99
10  93  96  97
```

As you can see, the 5th row is gone.