Create a confusion matrix and calculate its metrics. This function is an agnostic yardstick wrapper: it applies all yardstick functions that are metrics used for confusion matrices without internal hard-coded function names.

confusion_matrix(data, ...)

# S3 method for default
confusion_matrix(data, truth, estimate, na.rm = getOption("na.rm", FALSE), ...)

Arguments

data

Either a data.frame containing the columns specified by the truth and estimate arguments, or a table/matrix where the true class results should be in the columns of the table.

...

Not currently used.

truth

The column identifier for the true class results (that is a factor). This should be an unquoted column name although this argument is passed by expression and supports quasiquotation (you can unquote column names). For _vec() functions, a factor vector.

estimate

The column identifier for the predicted class results (that is also factor). As with truth this can be specified different ways but the primary method is to use an unquoted variable name. For _vec() functions, a factor vector.

na.rm

a logical to indicate whether empty must be removed

Default values of na.rm

This 'certestats' package supports a global default setting for na.rm in many mathematical functions. This can be set with options(na.rm = TRUE) or options(na.rm = FALSE).

For normality(), quantile() and IQR(), this also applies to the type argument. The default, type = 7, is the default of base R. Use type = 6 to comply with SPSS.

Examples

df <- tibble::tibble(name = c("Predict Yes", "Predict No"),
                     "Actual Yes" = c(123, 26),
                     "Actual No" = c(13, 834))
df
#> # A tibble: 2 × 3
#>   name        `Actual Yes` `Actual No`
#>   <chr>              <dbl>       <dbl>
#> 1 Predict Yes          123          13
#> 2 Predict No            26         834
confusion_matrix(df)
#> Original data:
#> 
#>             
#>              Actual Yes Actual No
#>   Actual Yes        123        13
#>   Actual No          26       834
#> 
#> 
#> Model metrics:
#> 
#> Accuracy                                       0.961
#> Area under the Precision Recall Curve (APRC)   0.125
#> Area under the Receiver Operator Curve (AROC)  0.063
#> Balanced Accuracy                              0.937
#> Brier Score for Classification Models (BSCM)   3.389
#> Costs Function for Poor Classification (CFPC)  1.688
#> F Measure                                      0.863
#> Gain Capture                                  -0.874
#> J-Index                                        0.874
#> Kappa                                          0.840
#> Matthews Correlation Coefficient (MCC)         0.842
#> Mean log Loss for Multinomial Data (MLMD)     31.122
#> Negative Predictive Value (NPV)                0.985
#> Positive Predictive Value (PPV)                0.826
#> Precision                                      0.826
#> Prevalence                                     0.150
#> Recall                                         0.904
#> Sensitivity                                    0.904
#> Specificity                                    0.970