Using Lexical Properties of Handwritten Equations to Estimate the Correctness of Students’ Solutions to Engineering Problems

Publication Information


  • Thomas F. Stahovich, University of California
  • Hanlung Lin, University of California & Amazon
  • Justin Gyllen, University of California


  • 459-483


  • Educational data mining, Digital ink, Problem solving, Handwritten equations, Smartpen


  • We present a technique that examines handwritten equations from a student’s solution to an engineering problem and from this estimates the correctness of the work. More specifically, we demonstrate that lexical properties of the equations correlate with the grade a human grader would assign. We characterize these properties with a set of features that include the number of occurrences of various classes of symbols and binary and tripartite sequences of them. Support vector machine (SVM) regression models trained with these features achieved a correlation of r = .433 (p< .001) on a combined set of six exam problems. Prior work suggests that the number of long pauses in the writing that occur as a student solves a problem correlates with correctness. We found that combining this pause feature with our lexical features produced more accurate predictions than using either type of feature alone. SVM regression models trained using an optimized subset of three lexical features and the pause feature achieved an average correlation with grade across the six problems of r = .503 (p< .001). These techniques are an important step toward creating systems that can automatically assess handwritten coursework.