Benchmark of Fleiss' kappa, Cohen's kappa, and the Brennan-Prediger coefficient
Benchmark for the non-aggregrated functions
Jonas Moss
4/25/2023
benchmarks_long.rmd
The main benefit of quadagree
versus irrCAC
for data on long form is its support for missing data and continuous
data, but bootstrapping and transformations can come in handy as well.
As it turns out, quadagree
is sightly faster than
irrCAC
for most practical data sizes. The largest
difference is found for Conger’s kappa, as the method used by
irrCAC
, which works for any weighting function, must
necessarily do much more work for Conger’s kappa. For
quadagree
, Fleiss’s kappa and Conger’s kappa are equally
fast.
The following benchmark is being run on ratings and raters.
library("quadagree")
x <- dat.zapf2016
irr_conger <- \(x) irrCAC::conger.kappa.raw(x, weights = "quadratic")
irr_fleiss <- \(x) irrCAC::fleiss.kappa.raw(x, weights = "quadratic")
irr_bp <- \(x) irrCAC::bp.coeff.raw(x, weights = "quadratic")
microbenchmark::microbenchmark(
irr_conger(x),
congerci(x),
irr_fleiss(x),
fleissci(x),
irr_bp(x),
bpci(x),
times = 1000
)
## Unit: microseconds
## expr min lq mean median uq max neval
## irr_conger(x) 1251.396 1297.387 1376.6267 1312.595 1336.6350 13004.414 1000
## congerci(x) 370.131 401.379 437.6575 417.174 428.8305 3877.338 1000
## irr_fleiss(x) 662.847 697.096 779.4348 707.851 721.8075 8714.877 1000
## fleissci(x) 366.133 397.948 420.8645 411.658 423.5655 3446.813 1000
## irr_bp(x) 649.272 682.263 741.6886 693.820 710.0455 4340.332 1000
## bpci(x) 397.572 430.624 470.3892 445.702 457.1540 4023.059 1000
Let’s try .
y <- rbind(x, x, x, x, x, x, x, x, x, x)
# 500 ratings
microbenchmark::microbenchmark(
irr_conger(y),
congerci(y),
irr_fleiss(y),
fleissci(y),
irr_bp(y),
bpci(y),
times = 1000
)
## Unit: microseconds
## expr min lq mean median uq max neval
## irr_conger(y) 1953.326 2037.0375 2351.9054 2066.0815 2102.8500 58641.427 1000
## congerci(y) 457.013 502.8190 537.7253 522.8710 543.8705 4083.753 1000
## irr_fleiss(y) 788.131 830.8805 875.0251 847.4120 866.4730 4839.012 1000
## fleissci(y) 458.465 498.5860 533.6132 518.4475 542.4325 3856.188 1000
## irr_bp(y) 764.878 815.3120 868.8832 831.8830 851.6340 4550.654 1000
## bpci(y) 543.104 589.3950 633.4511 608.9765 635.2210 5555.360 1000
For
,
quadagree
is roughly
times faster than irrCAC
for Conger’s kappa and the
Brennan-Prediger coefficient.
z <- rbind(y, y, y, y, y, y, y, y, y, y)
# 5000 ratings
microbenchmark::microbenchmark(
irr_conger(z),
congerci(z),
irr_fleiss(z),
fleissci(z),
irr_bp(z),
bpci(z)
)
## Unit: milliseconds
## expr min lq mean median uq max
## irr_conger(z) 16.605005 19.614978 20.086820 19.975275 20.263523 71.729186
## congerci(z) 2.221176 2.299062 2.435888 2.349476 2.451561 5.915702
## irr_fleiss(z) 3.474505 3.579702 4.054850 3.645415 3.733063 13.285418
## fleissci(z) 2.202451 2.290601 2.478628 2.356649 2.484287 5.746637
## irr_bp(z) 3.409935 3.506064 3.924030 3.557545 3.618785 10.126331
## bpci(z) 3.191246 3.298451 3.612792 3.381982 3.452605 6.659040
## neval
## 100
## 100
## 100
## 100
## 100
## 100
If we increase the number of categories, the differential becomes very large.
w <- cbind(y, y, y, y, y, y, y, y, y, y)
# 500 ratings and 40 categories.
microbenchmark::microbenchmark(
irr_conger(w),
congerci(w),
irr_fleiss(w),
fleissci(w),
irr_bp(w),
bpci(w)
)
## Unit: milliseconds
## expr min lq mean median uq max
## irr_conger(w) 15.279891 18.056969 20.821602 18.474173 19.064069 72.911493
## congerci(w) 20.324657 23.242999 26.181156 24.336600 26.389737 81.909648
## irr_fleiss(w) 2.939937 3.060958 3.480633 3.210573 3.291263 7.502654
## fleissci(w) 20.260156 23.249191 26.381585 24.020200 26.692999 78.435323
## irr_bp(w) 2.888512 3.045229 3.755012 3.154713 3.230786 10.834052
## bpci(w) 23.700528 26.757284 32.093062 28.027285 30.285835 85.202615
## neval
## 100
## 100
## 100
## 100
## 100
## 100