Estimation and inference for the quadratically weighted Fleiss' kappa and Conger's (Cohen's) kappa
quadagree.Rd
Confidence intervals for quadratic agreement coefficients with optional bootstrapping and transforms. Based on the formulas of Moss and van Oest (wip) and Moss (wip) along with standard asymptotic theory (Magnus, Neudecker, 2019) and the missing data theory of van Praag et al. (1985).
Usage
fleissci(
x,
type = c("adf", "elliptical", "normal", "unbiased"),
transform = "none",
conf_level = 0.95,
alternative = c("two.sided", "greater", "less"),
bootstrap = FALSE,
n_reps = 1000
)
congerci(
x,
type = c("adf", "elliptical", "normal", "unbiased"),
transform = "none",
conf_level = 0.95,
alternative = c("two.sided", "greater", "less"),
bootstrap = FALSE,
n_reps = 1000
)
cohenci(
x,
type = c("adf", "elliptical", "normal", "unbiased"),
transform = "none",
conf_level = 0.95,
alternative = c("two.sided", "greater", "less"),
bootstrap = FALSE,
n_reps = 1000
)
bpci(
x,
values = NULL,
kind = 1,
type = c("adf", "elliptical", "normal", "unbiased"),
transform = "none",
conf_level = 0.95,
alternative = c("two.sided", "greater", "less"),
bootstrap = FALSE,
n_reps = 1000
)
fleissci_aggr(
x,
values = seq_len(ncol(x)),
transform = "none",
conf_level = 0.95,
alternative = c("two.sided", "greater", "less"),
bootstrap = FALSE,
n_reps = 1000
)
bpci_aggr(
x,
values = seq_len(ncol(x)),
kind = 1,
transform = "none",
conf_level = 0.95,
alternative = c("two.sided", "greater", "less"),
bootstrap = FALSE,
n_reps = 1000
)
fleiss(x)
bp(x, values = stats::na.omit(unique(c(x))), kind = 1)
conger(x)
cohen(x)
fleiss_aggr(x, values = seq_len(ncol(x)))
bp_aggr(x, values = seq_len(ncol(x)), kind = 1)
Arguments
- x
Input data data can be converted to a matrix using
as.matrix
.- type
Type of confidence interval. Either
adf
,elliptical
, ornormal
. Ignored infleiss_aggrci
.- transform
One of
"none"
,"log"
,"fisher"
, and"arcsin
. Defaults to"none"
.- conf_level
Confidence level. Defaults to
0.95
.- alternative
A character string specifying the alternative hypothesis, must be one of
"two.sided"
(default),"greater"
or"less"
.- bootstrap
If
TRUE
, performs a studentized bootstrap withn_reps
repetitions. Defaults toFALSE
.- n_reps
Number of bootstrap samples if
bootstrap = TRUE
. Ignored ifbootstrap = FALSE
. Defaults to1000
.- values
to attach to each column on the Fleiss form data. Defaults to
1:C
, whereC
is the number of categories. Only used infleiss_aggr
andbp_aggr
.- kind
The kind of Brennan-Prediger coefficient used,
1
for the classical kind and2
for the kind introduced in Moss (2023). Only relevant forbp_aggr
andbp
.
Value
A vector of class quadagree
containing the confidence end points.
The arguments of the function call are included as attributes.
Details
There are two kinds of functions. The functions ending in aggr
should be
applied to data on aggregated form, where each row contains the number
of selected ratings for each category. The data set dat.fleiss1971
provides an example. Missing data is not supported for the aggr
functions. The other functions should be applied to data on long form,
where each row contains the ratings of every rater. The data sets
dat.zapf2016
and dat.klein2018
are examples. Missing data, and
continuous data, is supported. See the usage vignette for
more information.
For data on long form, the methods handle missing data using pairwise
available information, i.e., the option use = "pairwise.complete.obs"
in stats::cov()
along with the asymptotic theory of
van Praag et al. (1985). The bootstrap option
uses the studentized bootstrap (Efron, B. 1987), which is second order
correct. Both functions makes use of future.apply
when bootstrapping.
The type
variables defaults to adf
, asymptotically distribution-free,
which is consistent when the fourth moment is finite. The normal
option
assumes normality, and is not consistent for models with excess
kurtosis unequal to 0
. The elliptical
option assumes an
elliptical or pseudo-elliptical distribution of the data. The resulting
confidence intervals are corrected variants of the normal theory
intervals with a kurtosis correction (Yuan & Bentler 2002). The
common kurtosis parameter is calculated using the unbiased sample
kurtosis (Joanes, 1998).
Conger's (1980) kappa is a multi-rater generalization of Cohen's kappa.
All functions in this package work for multiple raters, so functions
starting with cohen
or conger
are aliases. The quadratically
weighted Cohen's kappa is also known as Lin's concordance coefficient.
The only difference between Cohen's kappa and Fleiss' kappa lies on how they measure disagreement due to chance. Here Fleiss' marginalizes the rating distribution across raters, essentially assuming there is no difference in the rating distribution across raters, while Cohen's kappa does not. There is a large literature comparing Fleiss' kappa to Cohen's kappa, and there is no consensus on which to prefer.
The aggregated functions takes an argument values
, which specifies what
numerical value to attach to each category. The default value for values
is 1...C
, where C
is the number of categories.
The Brennan-Prediger coefficients take an argument kind
. If equal to 1
,
it returns the traditional Brennan-Prediger coefficient. If kind
equals 2
, it returns the new Brennan-Prediger coefficient of Moss (wip).
References
Efron, B. (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association, 82(397), 171-185. https://doi.org/10.2307/2289144
Van Praag, B. M. S., Dijkstra, T. K., & Van Velzen, J. (1985). Least-squares theory based on general distributional assumptions with an application to the incomplete observations problem. Psychometrika, 50(1), 25-36. https://doi.org/10.1007/BF02294145
Joanes, D. N., & Gill, C. A. (1998). Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 47(1), 183-189. https://doi.org/10.1111/1467-9884.00122
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213-220. https://doi.org/10.1037/h0026256
Fleiss, J. L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31(3), 651-659. https://www.ncbi.nlm.nih.gov/pubmed/1174623
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88(2), 322-328. https://doi.org/10.1037/0033-2909.88.2.322
Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45(1), 255-268. https://www.ncbi.nlm.nih.gov/pubmed/2720055
Moss, van Oest (work in progress). Inference for quadratically weighted multi-rater kappas with missing raters.
Moss (work in progress). On the Brennan-Prediger coefficients.
Magnus, J. R., & Neudecker, H. (2019). Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons. https://doi.org/10.1002/9781119541219
Examples
library("quadagree")
# Fleiss' kappa for data on long form
fleissci(dat.zapf2016)
#> Call: fleissci(x = dat.zapf2016)
#>
#> 95% confidence interval (n = 50).
#> 0.025 0.975
#> 0.8418042 0.9549730
#>
#> Sample estimates.
#> kappa sd
#> 0.8983886 0.1971016
# Brennan-Prediger for data on aggregated form
bpci_aggr(dat.fleiss1971)
#> Call: bpci_aggr(x = dat.fleiss1971)
#>
#> 95% confidence interval (n = 30).
#> 0.025 0.975
#> 0.1219674 0.5458104
#>
#> Sample estimates.
#> kappa sd
#> 0.3338889 0.5579971
# Conger's (Cohen's) kappa for data on long form with missing values
congerci(dat.klein2018)
#> Call: congerci(x = dat.klein2018)
#>
#> 95% confidence interval (n = 10).
#> 0.025 0.975
#> -0.05610579 0.60352456
#>
#> Sample estimates.
#> kappa sd
#> 0.2737094 0.4373903