When we received data files, some rows contain duplicated IDs. This is a problem as ID should be unique. This function finds the columns with different values by comparing the two records with the same ID.
It is useful for deciding which row to keep when we find there are two duplicated IDs (e.g. after using
get_dup_id()
.This function conquers the problem 'NA' values when comparing the columns with the basic logical operators, e.g '==' and '!='. By default, they return 'NA' as long as there is any 'NA' value in the columns. However, we also want to know if the columns that one record gives "NA" and the other has values.
Note
This function can only compare 2 rows. It does not work if an ID is duplicated for more than 2 rows.
Examples
if (FALSE) { # \dontrun{
dup_id <- get_dup_id(df = df_dup_id, id_str = "id")$replicated_id
get_diff_cols(data = df_dup_id, id_str = "id", id_num = dup_id[1])
} # }