These are simple distribution metric functions.

se(x, na.rm = getOption("na.rm", FALSE))

ci(x, level = 0.95, na.rm = getOption("na.rm", FALSE))

sum_of_squares(x, correct_mean = TRUE, na.rm = getOption("na.rm", FALSE))

cv(x, na.rm = getOption("na.rm", FALSE))

cqv(x, na.rm = getOption("na.rm", FALSE))

mse(actual, predicted, na.rm = getOption("na.rm", FALSE))

mape(actual, predicted, na.rm = getOption("na.rm", FALSE))

rmse(actual, predicted, na.rm = getOption("na.rm", FALSE))

mae(actual, predicted, na.rm = getOption("na.rm", FALSE))

z_score(x, na.rm = getOption("na.rm", FALSE))

midhinge(x, na.rm = getOption("na.rm", FALSE))

ewma(x, lambda, na.rm = getOption("na.rm", FALSE))

rr_ewma(x, lambda, na.rm = getOption("na.rm", FALSE))

normalise(n, n_ref, per = 1000)

normalize(n, n_ref, per = 1000)

scale_sd(x)

centre_mean(x)

percentiles(x, na.rm = getOption("na.rm", FALSE))

deciles(x, na.rm = getOption("na.rm", FALSE))

Arguments

x

values

na.rm

a logical to indicate whether empty must be removed from x

level

alpha level, defaults to 95%

correct_mean

with TRUE (the default) correct for the mean will be applied, by summing each square of x after the mean of x has been subtracted, so that this says something about x. With FALSE, all x^2 are simply added together, so this says something about x's location in the data.

actual

Vector of actual values

predicted

Vector of predicted values

lambda

smoothing parameter, a value between 0 and 1. A value of 0 is equal to x, a value of 1 equal to the mean of x. The EWMA looks back and has a delay - the rrEWMA takes the mean of a 'forward' and 'backward' EWMA.

n

number to be normalised

n_ref

reference to use for normalisation

per

normalisation factor

Details

These are the explanations of the functions:

  • se() calculates the standard error: sd / square root of length

  • ci() calculates the confidence intervals for a mean (defaults at 95%), which returns length 2

  • sum_of_squares() calculates the sum of (x - mean(x)) ^ 2

  • cv() calculates the coefficient of variation: standard deviation / mean

  • cqv() calculates the coefficient of quartile variation: (Q3 - Q1) / (Q3 + Q1)

  • mse() calculates the mean squared error

  • mape() calculates the mean absolute percentage error

  • rmse() calculates the root mean squared error

  • mae() calculates the mean absolute error

  • z_score() calculates the number of standard deviations from the mean: (x - mean) / sd

  • midhinge() calculates the mean of interquartile range: (Q1 + Q3) / 2

  • ewma() calculates the EWMA (exponentially weighted moving average)

  • rr_ewma() calculates the rrEWMA (reversed-recombined exponentially weighted moving average)

  • normalise() normalises the data based on a reference: (n / reference) * unit

  • scale_sd() normalises the data to have a standard deviation of 1, while retaining the mean

  • centre_mean() normalises the data to have a mean of 0, while retaining the standard deviation

  • percentiles() and deciles() take a numeric vector as input, and return the lowest percentiles or deciles for each value

Default values of na.rm

This 'certestats' package supports a global default setting for na.rm in many mathematical functions. This can be set with options(na.rm = TRUE) or options(na.rm = FALSE).

For normality(), quantile() and IQR(), this also applies to the type argument. The default, type = 7, is the default of base R. Use type = 6 to comply with SPSS.

Examples

x <- c(0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 6)
percentiles(x)
#>  [1]   1   8  17  25  33  42  42  42  42  42  42  42 100
deciles(x)
#>  [1]  1  1  2  2  3  5  5  5  5  5  5  5 10

percentiles(rnorm(10))
#>  [1]  22  78   1  67  89 100  11  45  55  34

library(dplyr, warn.conflicts = FALSE)
tib <- as_tibble(matrix(as.integer(runif(40, min = 1, max = 7)), ncol = 4),
                 .name_repair = function(...) LETTERS[1:4])
tib
#> # A tibble: 10 × 4
#>        A     B     C     D
#>    <int> <int> <int> <int>
#>  1     2     1     5     3
#>  2     5     2     3     2
#>  3     5     2     4     2
#>  4     2     4     4     4
#>  5     6     3     1     3
#>  6     5     3     5     5
#>  7     1     5     5     2
#>  8     4     6     6     5
#>  9     5     2     6     1
#> 10     5     2     3     3

# percentiles per column
tib |> mutate_all(percentiles)
#> # A tibble: 10 × 4
#>        A     B     C     D
#>    <dbl> <dbl> <dbl> <dbl>
#>  1    12     1    56    45
#>  2    45    12    12    12
#>  3    45    12    34    12
#>  4    12    78    34    78
#>  5   100    56     1    45
#>  6    45    56    56    89
#>  7     1    89    56    12
#>  8    33   100    89    89
#>  9    45    12    89     1
#> 10    45    12    12    45