Confidence intervals for quadratic agreement coefficients with optional bootstrapping and transforms. Based on the formulas of Moss and van Oest (wip) and Moss (wip) along with standard asymptotic theory (Magnus, Neudecker, 2019) and the missing data theory of van Praag et al. (1985).


  type = c("adf", "elliptical", "normal", "unbiased"),
  transform = "none",
  conf_level = 0.95,
  alternative = c("two.sided", "greater", "less"),
  bootstrap = FALSE,
  n_reps = 1000

  values = NULL,
  kind = 1,
  type = c("adf", "elliptical", "normal", "unbiased"),
  transform = "none",
  conf_level = 0.95,
  alternative = c("two.sided", "greater", "less"),
  bootstrap = FALSE,
  n_reps = 1000

  values = seq_len(ncol(x)),
  transform = "none",
  conf_level = 0.95,
  alternative = c("two.sided", "greater", "less"),
  bootstrap = FALSE,
  n_reps = 1000

  values = seq_len(ncol(x)),
  kind = 1,
  transform = "none",
  conf_level = 0.95,
  alternative = c("two.sided", "greater", "less"),
  bootstrap = FALSE,
  n_reps = 1000


bp(x, values = stats::na.omit(unique(c(x))), kind = 1)



fleiss_aggr(x, values = seq_len(ncol(x)))

bp_aggr(x, values = seq_len(ncol(x)), kind = 1)



Input data data can be converted to a matrix using as.matrix.


Type of confidence interval. Either adf, elliptical, or normal. Ignored in fleiss_aggrci.


One of "none", "log", "fisher", and "arcsin. Defaults to "none".


Confidence level. Defaults to 0.95.


A character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".


If TRUE, performs a studentized bootstrap with n_reps repetitions. Defaults to FALSE.


Number of bootstrap samples if bootstrap = TRUE. Ignored if bootstrap = FALSE. Defaults to 1000.


to attach to each column on the Fleiss form data. Defaults to 1:C, where C is the number of categories. Only used in fleiss_aggr and bp_aggr.


The kind of Brennan-Prediger coefficient used, 1 for the classical kind and 2 for the kind introduced in Moss (2023). Only relevant for bp_aggr and bp.


A vector of class quadagree containing the confidence end points. The arguments of the function call are included as attributes.


There are two kinds of functions. The functions ending in aggr should be applied to data on aggregated form, where each row contains the number of selected ratings for each category. The data set dat.fleiss1971 provides an example. Missing data is not supported for the aggr functions. The other functions should be applied to data on long form, where each row contains the ratings of every rater. The data sets dat.zapf2016 and dat.klein2018 are examples. Missing data, and continuous data, is supported. See the usage vignette for more information.

For data on long form, the methods handle missing data using pairwise available information, i.e., the option use = "pairwise.complete.obs" in stats::cov() along with the asymptotic theory of van Praag et al. (1985). The bootstrap option uses the studentized bootstrap (Efron, B. 1987), which is second order correct. Both functions makes use of future.apply when bootstrapping.

The type variables defaults to adf, asymptotically distribution-free, which is consistent when the fourth moment is finite. The normal option assumes normality, and is not consistent for models with excess kurtosis unequal to 0. The elliptical option assumes an elliptical or pseudo-elliptical distribution of the data. The resulting confidence intervals are corrected variants of the normal theory intervals with a kurtosis correction (Yuan & Bentler 2002). The common kurtosis parameter is calculated using the unbiased sample kurtosis (Joanes, 1998).

Conger's (1980) kappa is a multi-rater generalization of Cohen's kappa. All functions in this package work for multiple raters, so functions starting with cohen or conger are aliases. The quadratically weighted Cohen's kappa is also known as Lin's concordance coefficient.

The only difference between Cohen's kappa and Fleiss' kappa lies on how they measure disagreement due to chance. Here Fleiss' marginalizes the rating distribution across raters, essentially assuming there is no difference in the rating distribution across raters, while Cohen's kappa does not. There is a large literature comparing Fleiss' kappa to Cohen's kappa, and there is no consensus on which to prefer.

The aggregated functions takes an argument values, which specifies what numerical value to attach to each category. The default value for values is 1...C, where C is the number of categories.

The Brennan-Prediger coefficients take an argument kind. If equal to 1, it returns the traditional Brennan-Prediger coefficient. If kind equals 2, it returns the new Brennan-Prediger coefficient of Moss (wip).


# Fleiss' kappa for data on long form
#> Call: fleissci(x = dat.zapf2016)
#> 95% confidence interval (n = 50).
#>     0.025     0.975 
#> 0.8418042 0.9549730 
#> Sample estimates.
#>     kappa        sd 
#> 0.8983886 0.1971016 

# Brennan-Prediger for data on aggregated form
#> Call: bpci_aggr(x = dat.fleiss1971)
#> 95% confidence interval (n = 30).
#>     0.025     0.975 
#> 0.1219674 0.5458104 
#> Sample estimates.
#>     kappa        sd 
#> 0.3338889 0.5579971 

# Conger's (Cohen's) kappa for data on long form with missing values
#> Call: congerci(x = dat.klein2018)
#> 95% confidence interval (n = 10).
#>       0.025       0.975 
#> -0.05610579  0.60352456 
#> Sample estimates.
#>     kappa        sd 
#> 0.2737094 0.4373903