Nav

Machines Scoring Essays On Standardized Tests

By Randall Mayes | Sep 18, 2014 05:46 PM EDT

Standardized testing in educational programs generates an extremely large number of essays. This has caused the educational community to reassess its grading options.

Common Core state standards, new educational standards and compulsory statewide student assessments require students to write standardized essays in order to graduate. Since 2005, the Scholastic Aptitude Test (SAT) for college entrance has included an essay section to measure writing skills.

Graduate and professional schools are also utilizing writing as part of the admissions process. A more recent phenomenon is universities are experimenting with Massive Open Online Courses (MOOCs), some with more than 100,000 students.

The market for standardized educational testing has led to commercial ventures and has become a multimillion-dollar business. Through bidding, educational testing companies have entered into contracts with individual states and private schools using thousands of human readers.

"For overall student assessment, grading essays is the most expensive component of standardized educational testing," reports The Futurist. For the last three decades, essay scoring has relied on human readers who have college degrees from different fields, have demonstrated writing ability and are qualified through scoring-agreement rates with other readers in practice sets.

From a business perspective, using artificial intelligence dramatically reduces the cost and time required to evaluate student writing. The algorithm developed by Educational Testing Service for the General Management Aptitude Test can score 16,000 essays in 20 seconds, The Futurist report states.

Automated essay scoring involves the development of algorithms that amplify human intelligence. Given examples already scored by expert human readers, the algorithms learn how to evaluate writing incrementally as they learn more from each sample essay.

Learning from these sample essays, AI researchers build statistical models enabling them to predict human scores. The scoring algorithms look for assigned linguistic features such as organization, word choice, sentence fluency, punctuation, word count, sentence count, number of long words, grammar, vocabulary and transitions. The features are weighted and result in a composite score.

"For assessing style components of essays, one weakness of human readers is that they are subjective, potentially influenced by different backgrounds and inconsistent. For assessing the content of essays, human readers have the advantage of common sense and reasoning ability. Humans are able to recognize essay development through irony, rhetoric, creativity, logical development, cause and effect and narrative," according to Shayne Miel, director of AI at Measurement Inc.

"One of the benefits of AI is that you, in some sense, average out all of the inconsistencies of the human readers," said Miel. "This is how AI is able to surpass the quality of scoring that existed in the training data."

Currently, hybrid scoring prevails because of the limitations of solely human or solely AI scoring. Human strengths in scoring of essays are the computer's weaknesses - and vice versa.

Latest Stories