Document Type

Thesis

Degree

Master of Science

Major

Computer Science

Date of Defense

7-17-2025

Graduate Advisor

Badri Adhikari, Ph.D.

Committee

Azim Ahmadzadeh, Ph.D.

Sharlee Climer, Ph.D.

Mark Hauschild, Ph.D.

Abstract

Background: Automated essay scoring (AES) is a challenging deep learning problem. The two most widely used methods for predicting essay quality scores, supervised learning-based and LLM-based, have their own limitations. Although supervised learning-based methods are more accurate, they only predict a score and do not offer descriptive feedback to students. On the other hand, LLM-based methods can offer rubric-guided feedback but are known to be less accurate.

Methods: This work focuses on improving the accuracy of state-of-the-art LLM-based AES methods. We began by thoroughly investigating why these methods were performing poorly for certain datasets and certain examples. This led us to identify several limitations. After this, we tested several new prompting strategies to improve the accuracy on these hard cases, while maintaining the accuracy on the others.

Results: Our improved prompting strategies improved the state-of-the-art essay scoring accuracy by 11%, with an increase in average QWK agreement scores from 0.53 to 0.60. This significant improvement comes from enriching the context for the LLM, reiterating critical instructions, and walking the LLM through the grading process. While this improvement in accuracy suggests a promising direction for AES, the evaluation also revealed several tail risks that raise serious concerns about its real-world implementation.

Available for download on Tuesday, August 11, 2026

Share

COinS