Work with presets for get_diver_data(). This automates selecting, filtering, and joining cBases.

presets()

get_preset(preset)

Arguments

preset

name of the preset

Details

The function presets() returns a data.frame with available presets, as defined in the secrets YAML file under db.presets.

The function get_preset() will return all the details of a preset as a list.

Required YAML Format

This YAML information should be put into the YAML file that is read using read_secret(). Afterwards, the name of the preset can be used as get_diver_data(preset = "...").

The most basic YAML form:

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"

The most extensive YAML form:

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  filter: ColumnName1 %in% c("Filter1", "Filter2") & ColumnName2 %in% c("Filter3", "Filter4")
  select: ColumnName1, ColumnName2, ColumnNameReceiptDate, new_name = OldName
  join:
    cbase: "location/to/another.cbase"
    by: ColumnName1, ColumnName2
    type: "left"
    filter: ColumnName1 == "abc"
    select: ColumnName1, col_name_2 = ColumnName2, ColumnName3, everything(), !starts_with("abc")
    wide_names_from: ColumnName3
  join2:
    cbase: "location/to/yet_another.cbase"
    by: ColumnName1, ColumnName2
    type: "left"
    select: ColumnName1, ColumnName2, ColumnName3
    wide_names_from: ColumnName3
    wide_name_trans: gsub("_", "..", .x)

For all presets, cbase and date_col are required.

Order of running

The YAML keys run in this order:

  1. Download cBase and filter (applied in WHERE statement)

  2. Select

  3. Join(s)

  4. Post-processing

After this, the arguments in get_diver_data() will run:

  1. Post-WHERE if post_where is set, using filter()

  2. Distinct if distinct = TRUE, using distinct()

  3. Auto-transform if autotransform = TRUE, using auto_transform()

cBase (cbase)

This cBase must be a filepath and must exist on the Diver server. For joins, this can also be another type of file, see Joins

Select (select)

Input for select will be passed on to select(), meaning that column names can be used, but also tidyselect functions such as everything().

Filters (filter)

Input for filter will be passed on to filter().

Joins (join)

For joins, you must set at least cbase, by, and type ("left", "right", "inner", etc., see here).

Other files than a cBase in the field cbase will be imported using certetoolbox::import(), such as an Excel or CSV file. This can be any file on any local or remote location (even live internet files). For example:

join:
  cbase: "location/to/excel_file.xlsx"
  by: ColumnName1
  type: "left"


join:
  cbase: "https://github.com/certe-medical-epidemiology/certegis/blob/main/data/geo_gemeenten.rda"
  by: ColumnName1
  type: "left"

An unlimited number of joins can be used, but all so-called 'keys' must be unique and start with join, e.g. joinA / joinB or join / join2 / join3.

If wide_names_from is set, the dataset is first transformed to long format using the columns specified in by, and then reshaped to wide format with values in wide_names_from.

Use wide_name_trans to transform the values in wide_names_from before the reshaping to wide format is applied. Use .x for the column values. As this will be applied before the data is transformed to a wide format, it allows to refine the values in wide_names_from using e.g., case_when().

Post-processing (post-processing)

Any data transformation can be done after the data have been downloaded and processed according to all previous steps. Use x to indicate the data set.

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing: |
    x |>
      mutate(Column1 = case_when(Column2 = "A" ~ 1,
                                 Column3 = "B" ~ 2,
                                 TRUE ~ 3))

As shown, in YAML the | character can be used to start a multi-line statement.

The post-processing field can also be an R file path, which will be sourced:

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing: "location/to/file.R"

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing1: "location/to/file 1.R"
  post-processing2: "location/to/file 2.R"

Note that all post-processing steps will run directly after the querying the data and thus before auto-transformation if autotransform = TRUE in get_diver_data().