Skip to contents

Use the quadagree package to estimate and do inference on the quadratically weighted Fleiss’ kappa and Conger’s kappa (the multirater variant of Cohen’s kappa). It supports data on two forms: Raw or aggregated. An example of data on “raw” or “wide” form is the following data set with 4 raters and 5 categories from Zapf et al. (2016).

knitr::kable(head(dat.zapf2016))
Rater A Rater B Rater C Rater D
5 5 4 5
1 1 1 1
5 5 5 5
1 3 3 3
5 5 5 5
1 1 1 1

To calculate asymptotic confidence intervals for Fleiss’ kappa, use

fleissci(dat.zapf2016)
#> Call: fleissci(x = dat.zapf2016)
#> 
#> 95% confidence interval (n = 50).
#>     0.025     0.975 
#> 0.8418042 0.9549730 
#> 
#> Sample estimates.
#>     kappa        sd 
#> 0.8983886 0.1971016

If you want to calculate Conger’s kappa (multi-rater Cohen’s kappa), use

cohenci(dat.zapf2016)
#> Call: cohenci(x = dat.zapf2016)
#> 
#> 95% confidence interval (n = 50).
#>     0.025     0.975 
#> 0.8419597 0.9549804 
#> 
#> Sample estimates.
#>     kappa        sd 
#> 0.8984700 0.1968438

Finally, the quadratic Brennan-Prediger coefficient can be found by

bpci(dat.zapf2016, kind = 1)
#> Call: bpci(x = dat.zapf2016, kind = 1)
#> 
#> 95% confidence interval (n = 50).
#>     0.025     0.975 
#> 0.8026384 0.9323616 
#> 
#> Sample estimates.
#>     kappa        sd 
#> 0.8675000 0.2259341

Kind 1 returns the classical Brennan-Prediger coefficients, while 2 returns the new formulation by Moss (2023).

bpci(dat.zapf2016, kind = 2)
#> Call: bpci(x = dat.zapf2016, kind = 2)
#> 
#> 95% confidence interval (n = 50).
#>     0.025     0.975 
#> 0.9013192 0.9661808 
#> 
#> Sample estimates.
#>    kappa       sd 
#> 0.933750 0.112967

It is also possible to use the studentized bootstrap by selecting the option bootstrap = "TRUE“, but this appears to confer no benefits and we would recommend sticking to the standard methods.

The package supports several options for estimating the asymptotic covariance matrix using the type argument. The adf option makes no assumptions about the data-generating process and is recommended. The elliptical option assumes the rating data is elliptical, and the normal option that it is normal. There are also a couple of transforms for the confidence intervals; none, fisher, and arcsin. The Fisher and arcsine transforms appear to work slightly better in practice.

All examples in this vignette use categorical data, but the package also supports continuous data, which is most meaningful for Fleiss’ kappa, Conger’s kappa, and the new Brennan-Prediger coefficient (i.e., with kind = 2).

This package supports missing data in the following sense: If the same set of raters rate every item, but some raters are missing for some of the rows. An example is the Klein (2018) data:

knitr::kable(dat.klein2018)
rater1 rater2 rater3 rater4 rater5
1 2 2 NA 2
1 1 3 3 3
3 3 3 3 3
1 1 1 1 3
1 1 1 3 3
1 2 2 2 2
1 1 1 1 1
2 2 2 2 3
1 3 NA NA 3
1 1 1 3 3

Here missing data is marked as N/A. The estimators of the coefficients are consistent, and the inference is based on pairwise available data using the method of van Praag (1985)

fleissci(dat.klein2018)
#> Call: fleissci(x = dat.klein2018)
#> 
#> 95% confidence interval (n = 10).
#>      0.025      0.975 
#> -0.1549052  0.6048678 
#> 
#> Sample estimates.
#>     kappa        sd 
#> 0.2249813 0.5037932

It’s not supported to do inference on data with less than 22 ratings per pair. The following data from Gwet (2014) is an example

knitr::kable(dat.gwet2014)
rater1 rater2 rater3 rater4 rater5
1 1 2 NA 2
1 1 0 1 NA
2 3 3 3 NA
NA 0 0 NA 0
0 0 0 NA 0
0 0 0 NA 0
1 0 2 NA 1
1 NA 2 0 NA
2 2 2 NA 2
2 1 1 1 NA
NA 1 0 0 NA
0 0 0 0 NA
1 2 2 2 NA
3 3 2 2 3
1 1 1 NA 1
1 1 1 NA 1
2 1 2 NA 2
1 2 3 3 NA
1 1 0 1 NA
0 0 0 NA 0

Observe that only one row is rated by both rater 4 and 5. Hence you get an error when calling

fleissci(dat.gwet2014)
#> Error in fun(calc): The data does not contain sufficient non-NAs.

Aggregated data

Data on aggregated form looks different, with each row aggregating the number of ratings for each category. The following example is from Fleiss (1971).

knitr::kable(head(dat.fleiss1971))
depression personality disorder schizophrenia neurosis other
0 0 0 6 0
0 3 0 0 3
0 1 4 0 1
0 0 0 0 6
0 3 0 3 0
2 0 4 0 0

The dataset has categories

colnames(dat.fleiss1971)
#> [1] "depression"           "personality disorder" "schizophrenia"       
#> [4] "neurosis"             "other"

Provided we understand depression as being 1, personality disorder as being 2, and so on, we can calculate and do inference on the quadratically weighted Fleiss kappa (but not Conger’s kappa) for data on this form.

fleissci_aggr(dat.fleiss1971)
#> Call: fleissci_aggr(x = dat.fleiss1971)
#> 
#> 95% confidence interval (n = 30).
#>      0.025      0.975 
#> 0.05668483 0.51145967 
#> 
#> Sample estimates.
#> kappa.xtx        sd 
#> 0.2840722 0.5987194

The results from using the aggregated functions agree with irrCAC, despite the underlying methodology being different. The main contributions of this package when it comes to aggregated data is the ability to calculate the new variant of the Brennan-Prediger coefficient, the support for a a values telling the program what value to attach to each rating, and the support for transforms. The calculations are also marginally faster, about 22 times, but this is unlikely to have an impact. Note that inference for missing data on aggregated form is not supported. Data on aggregated form can, however, be bootstrapped, and the transforms none, fisher and arcsin are supported. The type options used in the long form do not make sense in this context.