Available Presets for get_diver_data()

Work with presets for get_diver_data(). This automates selecting, filtering, and joining cBases.

presets()

get_preset(preset)

Arguments

preset: name of the preset

Details

The function presets() returns a data.frame with available presets, as defined in the secrets YAML file under db.presets.

The function get_preset() will return all the details of a preset as a list.

Required YAML Format

This YAML information should be put into the YAML file that is read using read_secret(). Afterwards, the name of the preset can be used as get_diver_data(preset = "...").

The most basic YAML form:

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"

The most extensive YAML form:

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  filter: ColumnName1 %in% c("Filter1", "Filter2") & ColumnName2 %in% c("Filter3", "Filter4")
  select: ColumnName1, ColumnName2, ColumnNameReceiptDate, new_name = OldName
  join:
    cbase: "location/to/another.cbase"
    by: ColumnName1, ColumnName2
    type: "left"
    filter: ColumnName1 == "abc"
    select: ColumnName1, col_name_2 = ColumnName2, ColumnName3, everything(), !starts_with("abc")
    wide_names_from: ColumnName3
  join2:
    cbase: "location/to/yet_another.cbase"
    by: ColumnName1, ColumnName2
    type: "left"
    select: ColumnName1, ColumnName2, ColumnName3
    wide_names_from: ColumnName3
    wide_name_trans: gsub("_", "..", .x)
  post-processing:
    - "file 1.R"
    - "file 2.R"

For all presets, cbase and date_col are required.

Order of running

The YAML keys run in this order:

Download cBase and filter (applied in WHERE statement)
Select
Join(s)
Post-processing

After this, the arguments in get_diver_data() will run:

Post-WHERE if post_where is set, using filter()
Distinct if distinct = TRUE, using distinct()
Auto-transform if autotransform = TRUE, using auto_transform()

cBase (`cbase`)

This cBase must be a filepath and must exist on the Diver server. For joins, this can also be another type of file, see Joins

Select (`select`)

Input for select will be passed on to select(), meaning that column names can be used, but also tidyselect functions such as everything().

Filters (`filter`)

Input for filter will be passed on to filter().

Joins (`join`)

For joins, you must set at least cbase, by, and type ("left", "right", "inner", etc., see here).

Other files than a cBase in the field cbase will be imported using certetoolbox::import(), such as an Excel or CSV file. This can be any file on any local or remote location (even live internet files). For example:

join:
  cbase: "location/to/excel_file.xlsx"
  by: ColumnName1
  type: "left"


join:
  cbase: "https://github.com/certe-medical-epidemiology/certegis/blob/main/data/geo_gemeenten.rda"
  by: ColumnName1
  type: "left"

An unlimited number of joins can be used, but all so-called 'keys' must be unique and start with join, e.g. joinA / joinB or join / join2 / join3.

If wide_names_from is set, the dataset is first transformed to long format using the columns specified in by, and then reshaped to wide format with values in wide_names_from.

Use wide_name_trans to transform the values in wide_names_from before the reshaping to wide format is applied. Use .x for the column values. As this will be applied before the data is transformed to a wide format, it allows to refine the values in wide_names_from using e.g., case_when().

Post-processing (`post-processing`)

Any data transformation can be done after the data have been downloaded and processed according to all previous steps. Use x to indicate the data set.

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing: |
    x |>
      mutate(Column1 = case_when(Column2 = "A" ~ 1,
                                 Column3 = "B" ~ 2,
                                 TRUE ~ 3))

As shown, in YAML the | character can be used to start a multi-line statement.

The post-processing field can also be one or more R file paths, which will be read using readLines(). Each of these chunks is valid:

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing: "location/to/file.R"

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing:
    - "location/to/file 1.R"
    - "location/to/file 2.R"

name_new_preset:
  cbase: "location/to/name.cbase"
  date_col: "ColumnNameDate"
  post-processing1: "location/to/file 1.R"
  post-processing2: "location/to/file 2.R"

Note that all post-processing steps will run directly after the querying the data and thus before auto-transformation if autotransform = TRUE in get_diver_data().

Available Presets for get_diver_data()

Arguments

Details

Required YAML Format

Order of running

cBase (cbase)

Select (select)

Filters (filter)

Joins (join)

Post-processing (post-processing)

Available Presets for `get_diver_data()`

cBase (`cbase`)

Select (`select`)

Filters (`filter`)

Joins (`join`)

Post-processing (`post-processing`)