When we received data files, some rows contain duplicated IDs. This is a problem as ID should be unique. This function finds the columns with different values by comparing the two records with the same ID.
It is useful for deciding which row to keep when we find there are two duplicated IDs (e.g. after using
get_dup_id()
.This function conquers the problem 'NA' values when comparing the columns. Most available functions by default returns 'NA' as long as there is any missing value in the columns. However, we also want to know the columns that one record gives "NA" and the other has values.
Arguments
- data_file
A file contain duplicated IDs.
- id_str
A string that contains ID name. E.g. "lopnr".
- id_num
A specific ID that the user want to examine.
Note
This function can only compare 2 rows. It does not work if an ID is duplicated for more than 2 rows.
Examples
if (FALSE) {
dup_id <- get_dup_id(df = df_dup_id, id_str = "id")$replicated_id
get_diff_cols(data = df_dup_id, id_str = "id", id_num = dup_id[1])
}