
Using the quadagree package
using_fleissci.RmdUse the quadagree package to estimate and do inference
on the quadratically weighted Fleiss’ kappa and Conger’s kappa (the
multirater variant of Cohen’s kappa). It supports data on two forms: Raw
or aggregated. An example of data on “raw” or “wide” form is the
following data set with 4 raters and 5 categories from Zapf et
al. (2016).
| Rater A | Rater B | Rater C | Rater D |
|---|---|---|---|
| 5 | 5 | 4 | 5 |
| 1 | 1 | 1 | 1 |
| 5 | 5 | 5 | 5 |
| 1 | 3 | 3 | 3 |
| 5 | 5 | 5 | 5 |
| 1 | 1 | 1 | 1 |
To calculate asymptotic confidence intervals for Fleiss’ kappa, use
fleissci(dat.zapf2016)
#> Call: fleissci(x = dat.zapf2016)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8418042 0.9549730
#>
#> Sample estimates.
#> kappa sd
#> 0.8983886 0.1971016If you want to calculate Conger’s kappa (multi-rater Cohen’s kappa), use
cohenci(dat.zapf2016)
#> Call: cohenci(x = dat.zapf2016)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8419597 0.9549804
#>
#> Sample estimates.
#> kappa sd
#> 0.8984700 0.1968438Finally, the quadratic Brennan-Prediger coefficient can be found by
bpci(dat.zapf2016, kind = 1)
#> Call: bpci(x = dat.zapf2016, kind = 1)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8026384 0.9323616
#>
#> Sample estimates.
#> kappa sd
#> 0.8675000 0.2259341Kind 1 returns the classical Brennan-Prediger
coefficients, while 2 returns the new formulation by Moss
(2023).
bpci(dat.zapf2016, kind = 2)
#> Call: bpci(x = dat.zapf2016, kind = 2)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.9013192 0.9661808
#>
#> Sample estimates.
#> kappa sd
#> 0.933750 0.112967It is also possible to use the studentized bootstrap by selecting the
option bootstrap = "TRUE“, but this appears to confer no
benefits and we would recommend sticking to the standard methods.
The package supports several options for estimating the asymptotic
covariance matrix using the type argument. The
adf option makes no assumptions about the data-generating
process and is recommended. The elliptical option assumes
the rating data is elliptical, and the normal option that
it is normal. There are also a couple of transforms for the confidence
intervals; none, fisher, and
arcsin. The Fisher and arcsine transforms appear to work
slightly better in practice.
All examples in this vignette use categorical data, but the package
also supports continuous data, which is most meaningful for Fleiss’
kappa, Conger’s kappa, and the new Brennan-Prediger coefficient (i.e.,
with kind = 2).
This package supports missing data in the following sense: If the same set of raters rate every item, but some raters are missing for some of the rows. An example is the Klein (2018) data:
knitr::kable(dat.klein2018)| rater1 | rater2 | rater3 | rater4 | rater5 |
|---|---|---|---|---|
| 1 | 2 | 2 | NA | 2 |
| 1 | 1 | 3 | 3 | 3 |
| 3 | 3 | 3 | 3 | 3 |
| 1 | 1 | 1 | 1 | 3 |
| 1 | 1 | 1 | 3 | 3 |
| 1 | 2 | 2 | 2 | 2 |
| 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 3 |
| 1 | 3 | NA | NA | 3 |
| 1 | 1 | 1 | 3 | 3 |
Here missing data is marked as N/A. The estimators of
the coefficients are consistent, and the inference is based on pairwise
available data using the method of van Praag (1985)
fleissci(dat.klein2018)
#> Call: fleissci(x = dat.klein2018)
#>
#> 95% confidence interval (n = 10).
#> 0.025 0.975
#> -0.1549052 0.6048678
#>
#> Sample estimates.
#> kappa sd
#> 0.2249813 0.5037932It’s not supported to do inference on data with less than ratings per pair. The following data from Gwet (2014) is an example
knitr::kable(dat.gwet2014)| rater1 | rater2 | rater3 | rater4 | rater5 |
|---|---|---|---|---|
| 1 | 1 | 2 | NA | 2 |
| 1 | 1 | 0 | 1 | NA |
| 2 | 3 | 3 | 3 | NA |
| NA | 0 | 0 | NA | 0 |
| 0 | 0 | 0 | NA | 0 |
| 0 | 0 | 0 | NA | 0 |
| 1 | 0 | 2 | NA | 1 |
| 1 | NA | 2 | 0 | NA |
| 2 | 2 | 2 | NA | 2 |
| 2 | 1 | 1 | 1 | NA |
| NA | 1 | 0 | 0 | NA |
| 0 | 0 | 0 | 0 | NA |
| 1 | 2 | 2 | 2 | NA |
| 3 | 3 | 2 | 2 | 3 |
| 1 | 1 | 1 | NA | 1 |
| 1 | 1 | 1 | NA | 1 |
| 2 | 1 | 2 | NA | 2 |
| 1 | 2 | 3 | 3 | NA |
| 1 | 1 | 0 | 1 | NA |
| 0 | 0 | 0 | NA | 0 |
Observe that only one row is rated by both rater 4 and 5. Hence you get an error when calling
fleissci(dat.gwet2014)
#> Error in fun(calc): The data does not contain sufficient non-NAs.Aggregated data
Data on aggregated form looks different, with each row aggregating the number of ratings for each category. The following example is from Fleiss (1971).
| depression | personality disorder | schizophrenia | neurosis | other |
|---|---|---|---|---|
| 0 | 0 | 0 | 6 | 0 |
| 0 | 3 | 0 | 0 | 3 |
| 0 | 1 | 4 | 0 | 1 |
| 0 | 0 | 0 | 0 | 6 |
| 0 | 3 | 0 | 3 | 0 |
| 2 | 0 | 4 | 0 | 0 |
The dataset has categories
colnames(dat.fleiss1971)
#> [1] "depression" "personality disorder" "schizophrenia"
#> [4] "neurosis" "other"Provided we understand depression as being
1, personality disorder as being
2, and so on, we can calculate and do inference on the
quadratically weighted Fleiss kappa (but not Conger’s kappa) for data on
this form.
fleissci_aggr(dat.fleiss1971)
#> Call: fleissci_aggr(x = dat.fleiss1971)
#>
#> 95% confidence interval (n = 30).
#> 0.025 0.975
#> 0.05668483 0.51145967
#>
#> Sample estimates.
#> kappa.xtx sd
#> 0.2840722 0.5987194The results from using the aggregated functions agree with
irrCAC, despite the underlying methodology being different.
The main contributions of this package when it comes to aggregated data
is the ability to calculate the new variant of the Brennan-Prediger
coefficient, the support for a a values telling the program
what value to attach to each rating, and the support for transforms. The
calculations are also marginally faster, about
times, but this is unlikely to have an impact. Note that inference for
missing data on aggregated form is not supported. Data on aggregated
form can, however, be bootstrapped, and the transforms
none, fisher and arcsin are
supported. The type options used in the long form do not
make sense in this context.