Using the quadagree package
using_fleissci.Rmd
Use the quadagree
package to estimate and do inference
on the quadratically weighted Fleiss’ kappa and Conger’s kappa (the
multirater variant of Cohen’s kappa). It supports data on two forms: Raw
or aggregated. An example of data on “raw” or “wide” form is the
following data set with 4 raters and 5 categories from Zapf et
al. (2016).
Rater A | Rater B | Rater C | Rater D |
---|---|---|---|
5 | 5 | 4 | 5 |
1 | 1 | 1 | 1 |
5 | 5 | 5 | 5 |
1 | 3 | 3 | 3 |
5 | 5 | 5 | 5 |
1 | 1 | 1 | 1 |
To calculate asymptotic confidence intervals for Fleiss’ kappa, use
fleissci(dat.zapf2016)
#> Call: fleissci(x = dat.zapf2016)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8418042 0.9549730
#>
#> Sample estimates.
#> kappa sd
#> 0.8983886 0.1971016
If you want to calculate Conger’s kappa (multi-rater Cohen’s kappa), use
cohenci(dat.zapf2016)
#> Call: cohenci(x = dat.zapf2016)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8419597 0.9549804
#>
#> Sample estimates.
#> kappa sd
#> 0.8984700 0.1968438
Finally, the quadratic Brennan-Prediger coefficient can be found by
bpci(dat.zapf2016, kind = 1)
#> Call: bpci(x = dat.zapf2016, kind = 1)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8026384 0.9323616
#>
#> Sample estimates.
#> kappa sd
#> 0.8675000 0.2259341
Kind 1
returns the classical Brennan-Prediger
coefficients, while 2
returns the new formulation by Moss
(2023).
bpci(dat.zapf2016, kind = 2)
#> Call: bpci(x = dat.zapf2016, kind = 2)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.9013192 0.9661808
#>
#> Sample estimates.
#> kappa sd
#> 0.933750 0.112967
It is also possible to use the studentized bootstrap by selecting the
option bootstrap = "TRUE
“, but this appears to confer no
benefits and we would recommend sticking to the standard methods.
The package supports several options for estimating the asymptotic
covariance matrix using the type
argument. The
adf
option makes no assumptions about the data-generating
process and is recommended. The elliptical
option assumes
the rating data is elliptical, and the normal
option that
it is normal. There are also a couple of transforms for the confidence
intervals; none
, fisher
, and
arcsin
. The Fisher and arcsine transforms appear to work
slightly better in practice.
All examples in this vignette use categorical data, but the package
also supports continuous data, which is most meaningful for Fleiss’
kappa, Conger’s kappa, and the new Brennan-Prediger coefficient (i.e.,
with kind = 2
).
This package supports missing data in the following sense: If the same set of raters rate every item, but some raters are missing for some of the rows. An example is the Klein (2018) data:
knitr::kable(dat.klein2018)
rater1 | rater2 | rater3 | rater4 | rater5 |
---|---|---|---|---|
1 | 2 | 2 | NA | 2 |
1 | 1 | 3 | 3 | 3 |
3 | 3 | 3 | 3 | 3 |
1 | 1 | 1 | 1 | 3 |
1 | 1 | 1 | 3 | 3 |
1 | 2 | 2 | 2 | 2 |
1 | 1 | 1 | 1 | 1 |
2 | 2 | 2 | 2 | 3 |
1 | 3 | NA | NA | 3 |
1 | 1 | 1 | 3 | 3 |
Here missing data is marked as N/A
. The estimators of
the coefficients are consistent, and the inference is based on pairwise
available data using the method of van Praag (1985)
fleissci(dat.klein2018)
#> Call: fleissci(x = dat.klein2018)
#>
#> 95% confidence interval (n = 10).
#> 0.025 0.975
#> -0.1549052 0.6048678
#>
#> Sample estimates.
#> kappa sd
#> 0.2249813 0.5037932
It’s not supported to do inference on data with less than ratings per pair. The following data from Gwet (2014) is an example
knitr::kable(dat.gwet2014)
rater1 | rater2 | rater3 | rater4 | rater5 |
---|---|---|---|---|
1 | 1 | 2 | NA | 2 |
1 | 1 | 0 | 1 | NA |
2 | 3 | 3 | 3 | NA |
NA | 0 | 0 | NA | 0 |
0 | 0 | 0 | NA | 0 |
0 | 0 | 0 | NA | 0 |
1 | 0 | 2 | NA | 1 |
1 | NA | 2 | 0 | NA |
2 | 2 | 2 | NA | 2 |
2 | 1 | 1 | 1 | NA |
NA | 1 | 0 | 0 | NA |
0 | 0 | 0 | 0 | NA |
1 | 2 | 2 | 2 | NA |
3 | 3 | 2 | 2 | 3 |
1 | 1 | 1 | NA | 1 |
1 | 1 | 1 | NA | 1 |
2 | 1 | 2 | NA | 2 |
1 | 2 | 3 | 3 | NA |
1 | 1 | 0 | 1 | NA |
0 | 0 | 0 | NA | 0 |
Observe that only one row is rated by both rater 4 and 5. Hence you get an error when calling
fleissci(dat.gwet2014)
#> Error in fun(calc): The data does not contain sufficient non-NAs.
Aggregated data
Data on aggregated form looks different, with each row aggregating the number of ratings for each category. The following example is from Fleiss (1971).
depression | personality disorder | schizophrenia | neurosis | other |
---|---|---|---|---|
0 | 0 | 0 | 6 | 0 |
0 | 3 | 0 | 0 | 3 |
0 | 1 | 4 | 0 | 1 |
0 | 0 | 0 | 0 | 6 |
0 | 3 | 0 | 3 | 0 |
2 | 0 | 4 | 0 | 0 |
The dataset has categories
colnames(dat.fleiss1971)
#> [1] "depression" "personality disorder" "schizophrenia"
#> [4] "neurosis" "other"
Provided we understand depression
as being
1
, personality disorder
as being
2
, and so on, we can calculate and do inference on the
quadratically weighted Fleiss kappa (but not Conger’s kappa) for data on
this form.
fleissci_aggr(dat.fleiss1971)
#> Call: fleissci_aggr(x = dat.fleiss1971)
#>
#> 95% confidence interval (n = 30).
#> 0.025 0.975
#> 0.05668483 0.51145967
#>
#> Sample estimates.
#> kappa.xtx sd
#> 0.2840722 0.5987194
The results from using the aggregated functions agree with
irrCAC
, despite the underlying methodology being different.
The main contributions of this package when it comes to aggregated data
is the ability to calculate the new variant of the Brennan-Prediger
coefficient, the support for a a values
telling the program
what value to attach to each rating, and the support for transforms. The
calculations are also marginally faster, about
times, but this is unlikely to have an impact. Note that inference for
missing data on aggregated form is not supported. Data on aggregated
form can, however, be bootstrapped, and the transforms
none
, fisher
and arcsin
are
supported. The type
options used in the long form do not
make sense in this context.