Abstract
This study aimed to examine the reliability and accuracy of ChatGPT in scoring reflective essays. The study took place in a course on Teaching and Learning in English for Academic Writing. Students completed a task to write a systematic literature review on computing education. The main goal was to compare ChatGPT scores with scores from human tutors. ChatGPT received assessment rubrics and prompt instructions. The research team verified ChatGPT's understanding of the rubric. ChatGPT then scored the essays independently. The research used 72 student reflections as the dataset. Expert tutors also scored the same essays. The comparison used statistical tests. Cohen's Kappa and Intraclass Correlation Coefficient (ICC) showed low agreement. The results indicated poor consistency between ChatGPT and human scores. Error analysis showed a consistent pattern. ChatGPT gave higher scores than human raters. The difference was statistically significant. The study found that ChatGPT worked efficiently and gave fast results. However, ChatGPT lacked the ability to interpret subtle meaning. ChatGPT also did not always follow the rubric as expected. The study concludes that ChatGPT can support formative feedback. However, ChatGPT should not replace human judgment in academic writing assessment.
| Original language | English |
|---|---|
| Article number | 06014 |
| Journal | E3S Web of Conferences |
| Volume | 645 |
| DOIs | |
| Publication status | Published - 28 Aug 2025 |
| Event | 1st International Conference on Green Engineering for Sustainable Future, ICoGESF 2025 - Hybrid, Surabaya, Indonesia Duration: 5 Jul 2025 → … |
Fingerprint
Dive into the research topics of 'Comparing AI and Human Assessment of Academic Writing Skills: A Kappa Analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver