Document Type

Thesis

Degree

Master of Science

Major

Computer Science

Date of Defense

7-17-2025

Graduate Advisor

Badri Adhikari, Ph.D.

Committee

Azim Ahmadzadeh, Ph.D.

Sharlee Climer, Ph.D.

Mark Hauschild, Ph.D.

Abstract

Background: Automated essay scoring (AES) is a challenging deep learning problem. The two most widely used methods for predicting essay quality scores, supervised learning-based and LLM-based, have their own limitations. Although supervised learning-based methods are more accurate, they only predict a score and do not offer descriptive feedback to students. On the other hand, LLM-based methods can offer rubric-guided feedback but are known to be less accurate.

Methods: This work focuses on improving the accuracy of state-of-the-art LLM-based AES methods. We began by thoroughly investigating why these methods were performing poorly for certain datasets and certain examples. This led us to identify several limitations. After this, we tested several new prompting strategies to improve the accuracy on these hard cases, while maintaining the accuracy on the others.

Results: Our improved prompting strategies improved the state-of-the-art essay scoring accuracy by 11%, with an increase in average QWK agreement scores from 0.53 to 0.60. This significant improvement comes from enriching the context for the LLM, reiterating critical instructions, and walking the LLM through the grading process. While this improvement in accuracy suggests a promising direction for AES, the evaluation also revealed several tail risks that raise serious concerns about its real-world implementation.

Recommended Citation

Fink, Thomas A., "Limitations of Using Large Language Models for Automated Essay Scoring" (2025). Theses. 495.
https://irl.umsl.edu/thesis/495

Download

Available for download on Tuesday, August 11, 2026

Included in

Artificial Intelligence and Robotics Commons, Educational Assessment, Evaluation, and Research Commons, Educational Technology Commons

COinS

Theses

Limitations of Using Large Language Models for Automated Essay Scoring

Document Type

Degree

Major

Date of Defense

Graduate Advisor

Committee

Abstract

Recommended Citation

Included in

Search Archived Works

Browse

Participate

Links

Theses

Limitations of Using Large Language Models for Automated Essay Scoring

Author

Document Type

Degree

Major

Date of Defense

Graduate Advisor

Committee

Abstract

Recommended Citation

Included in

Share

Search Archived Works

Browse

Participate

Links