Javascript must be enabled for the correct page display

How Reliable is Performance-Based Assessment? Comparing Holistic, Analytic, and Comparative Judgment Approaches

Sievers, Charlotte (2023) How Reliable is Performance-Based Assessment? Comparing Holistic, Analytic, and Comparative Judgment Approaches. Master thesis, Psychology.

[img]
Preview
Text
Master's Thesis C.Sievers.pdf

Download (529kB) | Preview

Abstract

Assessment is crucial in education, and determining its reliability is vital for ensuring consistent and accurate evaluation. Two common assessment approaches are analytic and holistic methods. In educational research, a consensus on which method presents better reliability is still lacking. A promising emerging approach is comparative judgment (CJ) assessment, which relies on relative judgment during decision-making. Given the standardization offered by the analytic method by specifying performance dimensions and the holistic approach relying on intuitive judgment which can often be incoherent, we hypothesized that the analytic method would present the highest inter-rater reliability. CJ assessment, which determines a rank order based on multiple pairwise comparisons, was expected to demonstrate greater reliability than the holistic approach. A convenience sample of N = 135 undergraduate students was gathered to assess 30 short-written essays, using a between-subjects design. Raters’ perceived complexity of applying the method and general decision-making style (holistic vs analytic) were additionally explored to detect possible barriers to the practical implementation of the most reliable method. The intraclass correlation coefficient estimated inter-rater reliability for the analytical and holistic method, revealing low to moderate reliability. The CJ method's reliability was assessed using scale separation reliability and Pearson's product-moment correlation, returning high inter-rater reliability. Significant differences in perceived complexity and decision-making tendencies were found. Due to this study’s methodological limitations, it is challenging to draw definitive implications. Future research should explore the validity of comparing different reliability measures and use a more adapted rubric with fewer criteria. Keywords: Performance-based assessment, convenience sample, inter-rater reliability, rater perceptions, intraclass correlation, scale separation reliability

Item Type: Thesis (Master)
Supervisor name: Arboleda Cardona, J.C. and Niessen, A.S.M.
Degree programme: Psychology
Differentiation route: Talent Development and Creativity (TDC) [Master Psychology]
Date Deposited: 13 Jul 2023 09:04
Last Modified: 13 Jul 2023 09:04
URI: http://gmwpublic.studenttheses.ub.rug.nl/id/eprint/2354

Actions (login required)

View Item View Item