Work with presets for get_diver_data()
. This automates selecting, filtering, and joining cBases.
presets()
get_preset(preset)
The function presets()
returns a data.frame with available presets, as defined in the secrets YAML file under db.presets
.
The function get_preset()
will return all the details of a preset as a list.
This YAML information should be put into the YAML file that is read using read_secret()
. Afterwards, the name of the preset can be used as get_diver_data(preset = "...")
.
The most basic YAML form:
The most extensive YAML form:
name_new_preset:
cbase: "location/to/name.cbase"
date_col: "ColumnNameDate"
filter: ColumnName1 %in% c("Filter1", "Filter2") & ColumnName2 %in% c("Filter3", "Filter4")
select: ColumnName1, ColumnName2, ColumnNameReceiptDate, new_name = OldName
join:
cbase: "location/to/another.cbase"
by: ColumnName1, ColumnName2
type: "left"
filter: ColumnName1 == "abc"
select: ColumnName1, col_name_2 = ColumnName2, ColumnName3, everything(), !starts_with("abc")
wide_names_from: ColumnName3
join2:
cbase: "location/to/yet_another.cbase"
by: ColumnName1, ColumnName2
type: "left"
select: ColumnName1, ColumnName2, ColumnName3
wide_names_from: ColumnName3
wide_name_trans: gsub("_", "..", .x)
For all presets, cbase
and date_col
are required.
The YAML keys run in this order:
Download cBase and filter (applied in WHERE
statement)
Select
Join(s)
Post-processing
After this, the arguments in get_diver_data()
will run:
Post-WHERE if post_where
is set, using filter()
Distinct if distinct = TRUE
, using distinct()
Auto-transform if autotransform = TRUE
, using auto_transform()
cbase
)This cBase must be a filepath and must exist on the Diver server. For joins, this can also be another type of file, see Joins
select
)Input for select
will be passed on to select()
, meaning that column names can be used, but also tidyselect
functions such as everything()
.
filter
)Input for filter
will be passed on to filter()
.
join
)For joins, you must set at least cbase
, by
, and type
("left", "right", "inner", etc., see here).
Other files than a cBase in the field cbase
will be imported using certetoolbox::import()
, such as an Excel or CSV file. This can be any file on any local or remote location (even live internet files). For example:
join:
cbase: "location/to/excel_file.xlsx"
by: ColumnName1
type: "left"
join:
cbase: "https://github.com/certe-medical-epidemiology/certegis/blob/main/data/geo_gemeenten.rda"
by: ColumnName1
type: "left"
An unlimited number of joins can be used, but all so-called 'keys' must be unique and start with join
, e.g. joinA
/ joinB
or join
/ join2
/ join3
.
If wide_names_from
is set, the dataset is first transformed to long format using the columns specified in by
, and then reshaped to wide format with values in wide_names_from
.
Use wide_name_trans
to transform the values in wide_names_from
before the reshaping to wide format is applied. Use .x
for the column values. As this will be applied before the data is transformed to a wide format, it allows to refine the values in wide_names_from
using e.g., case_when()
.
post-processing
)Any data transformation can be done after the data have been downloaded and processed according to all previous steps. Use x
to indicate the data set.
name_new_preset:
cbase: "location/to/name.cbase"
date_col: "ColumnNameDate"
post-processing: |
x |>
mutate(Column1 = case_when(Column2 = "A" ~ 1,
Column3 = "B" ~ 2,
TRUE ~ 3))
As shown, in YAML the |
character can be used to start a multi-line statement.
The post-processing
field can also be an R file path, which will be sourced:
name_new_preset:
cbase: "location/to/name.cbase"
date_col: "ColumnNameDate"
post-processing: "location/to/file.R"
name_new_preset:
cbase: "location/to/name.cbase"
date_col: "ColumnNameDate"
post-processing1: "location/to/file 1.R"
post-processing2: "location/to/file 2.R"
Note that all post-processing steps will run directly after the querying the data and thus before auto-transformation if autotransform = TRUE
in get_diver_data()
.