Skip to contents

This function joins 2 tibbles and generates unique columns to record non-NA values when there are 2 identical column names.

Usage

get_unique_join(tibble_names, join_type, by_cols)

Arguments

tibble_names

A vector of tibbles' names in your R environment.

join_type

It should be one of 'full_join', 'left_join', 'inner_join'.

by_cols

It should include the columns in a format that `join_by` can interpret.

Value

A tibble contains unique columns to record non-NA values for identical columns of two tables.

Details

The motivation to create this function is that when Bolin extract SNAC-N variables, he finds that for wave 3, there are overlapping information from different sources.

For example, for physician variables, there are 3 files:

  • All wave 3 participants.

  • Only cohort 1's follow-up at wave 3.

  • Only cohort 2's baseline.

There are overlapping of both participants and variables. In addition, for the same participant and same variable, some are NA in one file whereas not NA in the other file. If only using a join function, the common columns will be separated to '.x' and '.y'. And we have to use coalesce to pick one without NA. And then we delete the '.x' and '.y' columns.

To avoid repetitive work, Bolin is writing this function to take in the information from different data files.

Examples

if (FALSE) {

## fake_snacn_ph_wave3 and fake_snacn_ph_fu contains same variable 'ph121'.
## but for some obs, in one file in NA, in the other file is not NA. E.g. Lopnr = 10

left_join(fake_snacn_ph_wave3, fake_snacn_ph_fu, by = join_by(Lopnr == N1lopnr, age))

## let's just combine the info together and only keep the one without NA.


get_unique_join(tibble_names = c("fake_snacn_ph_wave3", "fake_snacn_ph_fu"),
               join_type = "full_join",
               by_cols = "Lopnr == N1lopnr, age")
}