These are simple distribution metric functions.

se(x, na.rm = getOption("na.rm", FALSE))

ci(x, level = 0.95, na.rm = getOption("na.rm", FALSE))

sum_of_squares(x, correct_mean = TRUE, na.rm = getOption("na.rm", FALSE))

cv(x, na.rm = getOption("na.rm", FALSE))

cqv(x, na.rm = getOption("na.rm", FALSE))

mse(actual, predicted, na.rm = getOption("na.rm", FALSE))

mape(actual, predicted, na.rm = getOption("na.rm", FALSE))

rmse(actual, predicted, na.rm = getOption("na.rm", FALSE))

mae(actual, predicted, na.rm = getOption("na.rm", FALSE))

z_score(x, na.rm = getOption("na.rm", FALSE))

midhinge(x, na.rm = getOption("na.rm", FALSE))

ewma(x, lambda, na.rm = getOption("na.rm", FALSE))

rr_ewma(x, lambda, na.rm = getOption("na.rm", FALSE))

normalise(n, n_ref, per = 1000)

normalize(n, n_ref, per = 1000)

scale_sd(x)

centre_mean(x)

percentiles(x, na.rm = getOption("na.rm", FALSE))

deciles(x, na.rm = getOption("na.rm", FALSE))

Arguments

x

values

na.rm

a logical to indicate whether empty must be removed from x

level

alpha level, defaults to 95%

correct_mean

with TRUE (the default) correct for the mean will be applied, by summing each square of x after the mean of x has been subtracted, so that this says something about x. With FALSE, all x^2 are simply added together, so this says something about x's location in the data.

actual

Vector of actual values

predicted

Vector of predicted values

lambda

smoothing parameter, a value between 0 and 1. A value of 0 is equal to x, a value of 1 equal to the mean of x. The EWMA looks back and has a delay - the rrEWMA takes the mean of a 'forward' and 'backward' EWMA.

n

number to be normalised

n_ref

reference to use for normalisation

per

normalisation factor

Details

These are the explanations of the functions:

  • se() calculates the standard error: sd / square root of length

  • ci() calculates the confidence intervals for a mean (defaults at 95%), which returns length 2

  • sum_of_squares() calculates the sum of (x - mean(x)) ^ 2

  • cv() calculates the coefficient of variation: standard deviation / mean

  • cqv() calculates the coefficient of quartile variation: (Q3 - Q1) / (Q3 + Q1)

  • mse() calculates the mean squared error

  • mape() calculates the mean absolute percentage error

  • rmse() calculates the root mean squared error

  • mae() calculates the mean absolute error

  • z_score() calculates the number of standard deviations from the mean: (x - mean) / sd

  • midhinge() calculates the mean of interquartile range: (Q1 + Q3) / 2

  • ewma() calculates the EWMA (exponentially weighted moving average)

  • rr_ewma() calculates the rrEWMA (reversed-recombined exponentially weighted moving average)

  • normalise() normalises the data based on a reference: (n / reference) * unit

  • scale_sd() normalises the data to have a standard deviation of 1, while retaining the mean

  • centre_mean() normalises the data to have a mean of 0, while retaining the standard deviation

  • percentiles() and deciles() take a numeric vector as input, and return the lowest percentiles or deciles for each value

Default values of na.rm

This 'certestats' package supports a global default setting for na.rm in many mathematical functions. This can be set with options(na.rm = TRUE) or options(na.rm = FALSE).

For normality(), quantile() and IQR(), this also applies to the type argument. The default, type = 7, is the default of base R. Use type = 6 to comply with SPSS.

Examples

x <- c(0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 6)
percentiles(x)
#>  [1]   1   8  17  25  33  42  42  42  42  42  42  42 100
deciles(x)
#>  [1]  1  1  2  2  3  5  5  5  5  5  5  5 10

percentiles(rnorm(10))
#>  [1]  22  78  67 100  56  12  44  89   1  34

library(dplyr, warn.conflicts = FALSE)
tib <- as_tibble(matrix(as.integer(runif(40, min = 1, max = 7)), ncol = 4),
                 .name_repair = function(...) LETTERS[1:4])
tib
#> # A tibble: 10 × 4
#>        A     B     C     D
#>    <int> <int> <int> <int>
#>  1     1     1     1     6
#>  2     1     3     5     5
#>  3     1     2     4     6
#>  4     5     2     5     3
#>  5     6     4     5     2
#>  6     6     2     1     5
#>  7     3     4     3     4
#>  8     2     2     5     1
#>  9     3     6     4     3
#> 10     1     6     4     1

# percentiles per column
tib |> mutate_all(percentiles)
#> # A tibble: 10 × 4
#>        A     B     C     D
#>    <dbl> <dbl> <dbl> <dbl>
#>  1     1     1     1    89
#>  2     1    56    67    67
#>  3     1    12    34    89
#>  4    78    12    67    34
#>  5    89    67    67    22
#>  6    89    12     1    67
#>  7    56    67    22    56
#>  8    44    12    67     1
#>  9    56    89    34    34
#> 10     1    89    34     1