Skip to contents

When we received data files, some rows contain duplicated IDs. This is a problem as ID should be unique. This function finds the columns with different values by comparing the two records with the same ID.

  • It is useful for deciding which row to keep when we find there are two duplicated IDs (e.g. after using get_dup_id().

  • This function conquers the problem 'NA' values when comparing the columns. Most available functions by default returns 'NA' as long as there is any missing value in the columns. However, we also want to know the columns that one record gives "NA" and the other has values.

Usage

get_diff_cols(data_file, id_str, id_num)

Arguments

data_file

A file contain duplicated IDs.

id_str

A string that contains ID name. E.g. "lopnr".

id_num

A specific ID that the user want to examine.

Value

It returns a tibble with columns containing different values.

Note

This function can only compare 2 rows. It does not work if an ID is duplicated for more than 2 rows.

Examples

if (FALSE) {
dup_id <- get_dup_id(df = df_dup_id, id_str = "id")$replicated_id
get_diff_cols(data = df_dup_id, id_str = "id", id_num = dup_id[1])
}