When I started teaching a long time ago, I never imagined that I would see a little laptop on each of my students’ desks and something like ChatGPT. For a little less than half my career, I taught French. Google translate never occurred to me as anything more than science fiction. Yet here we are.
Could I write an app that would grade my students’ summaries? The answer turned out to be a pretty decent “yes!”
Like all technological advances, we’re going to do it even though it wouldn’t be so bad if we didn’t. Human beings just can’t help themselves. Luddites smashed the textile machines that took their jobs, but innovation in manufacturing went on anyway. Self-restraint is not a human virtue we can ever maintain for long. Our inner drives, born in Paleolithic desperation, eventually will have their way. We cannot ban AI development because somebody, somewhere is going to do it and then those that didn’t will be at a disadvantage. All things considered, I think educators can look forward to the AI developments soon to be upon us with a sense of optimism instead of dread.
AI cannot replace teachers. One big reason is that AI cannot develop the kind of personal rapport with students that has always been the foundation youngsters need in order to learn.
My interest in artificial intelligence grew from my interest in computer programming. When I started out learning BASIC in the early 1990s, I built a program that would block vulgar words in my students’ data entry fields. We were using pre-Windows DOS machines, 286’s donated from a closing Air Force base nearby. The project occurred to me to try to make what I now know to be called a “chatbot”. I tried devising software that would converse with me in simple sentences and such that, if it didn’t know how to respond, it would ask me and then store that as an option for future response. The reader will not be surprised to find that this project did not work. In hindsight, I now know I was way, way out of my league in attempting something like that. Besides that, the computing power necessary for machine learning, let alone the troves of digital data needed to train an AI on, did not exist in 1994. But I am contented to know that I had the basic gist of the idea of machine learning that real engineers would eventually put to use.
When I submitted the algorithm-generated summary into the AI grading assistant, which evaluates it based on comparison to human-composed ones, it scored 100%. Every. Time.
About five years ago, I started exploring the idea of automating some of my grading. I was teaching social studies and I would assign summarizing as the way students were to process textbook articles. I am convinced this is a far better method that having students answer questions on text they are reading. The problem was that I had about a hundred students across six different grade levels. The beginning of a unit would generate several hundred summaries a week to grade. Could I write an app that would grade my students’ summaries? The answer turned out to be a pretty decent “yes!”
AI-Scored Summaries
The AI grading assistant here at Innovation Assessments was trained on 500 human-scored summaries. The algorithm looks at eleven features of the text and compares it to the same text features of up to seven other models of summaries scoring 100%. These features include things like a Flesch-Kincaid readability measure, word count, common proper nouns and verb phrases, and statistical comparisons like cosine and Jaccard similarity. Before analysis, the app removes stop words, reduces words to their root form (lemma) and reduces many words to a common synonym (so the app can understand ideas written in slightly different wording). The method I used to establish the scoring algorithm was to chart these comparisons in a spreadsheet and adjust them until the AI scored the work about as I would have most of the time.
I was very pleased with the results on this. The scoring of student work became a lot faster. The app brings up the AI score estimate and I can check it to confirm. This is why I call it an “AI grading Assistant”: it still needs a human supervisor. As time went on, though, I came to trust the app more and more. When I set up the assignment, I would enter my own summary of the target text from the start. Once students completed the task, I went first to score the work of students who usually get 100%. I could add up to six of these to the “corpus”, which is the body of model text the software uses to judge. The next step was to run the AI grading assistant on the work submissions of the rest of the class.
The scoring of summaries in this way required one or more human-composed models. Next, I wondered whether I could write an algorithm that would summarize a text. I am not able to write code that can write “in its own words”. Instead, my little bot mainly extracts the first sentence of each paragraph and then some selected other sentences verbatim if they meet certain criteria (such as the presence of key words identified by frequency in the text). I had my doubts about how effective this would be. Surely, it would lose some important meaning sometimes since it was a formula and not really “reading” like a human would. Well, get this …
… When I submitted the algorithm-generated summary into the AI grading assistant, which evaluates it based on comparison to human-composed ones, it scored 100%. Every. Time.
AI-Scored Short Answer Tests
Another challenge of teaching social studies with a lot of reading and writing was the large volume of grading student work in the form of short answer tests, particularly document-based analyses. Could some similar software assist in scoring short answer tests?
The app development method was about the same: I had hundreds of student work samples to analyze. Using some similar methods as for grading summaries, the new app allowed the teacher to add up to five versions of full-credit answers into the corpus for comparison. One feature that was not examined in the summary AI grading assistant was the degree to which a student’s writing was analytical (as opposed to merely descriptive). This project went fairly well – well enough for an amateur programmer and accurate enough such that the short answer scoring was a huge help to me. Click here to read more about development of an algorithm to measure the degree of analysis in a student writing sample.
The AI-assisted scoring of short answer tests was most successful at evaluating responses that had a limited range of credit-worthy answers. The AI performed well for questions like “What caused the fall of the Roman Empire?” The AI did not perform well on questions such as evaluating the reliability of a primary source, since the range of possible correct answers would have required a lot more models to train on than the five the software allows. Nonetheless, the short answer AI grading assistant saved me tons of time. It allowed me to maintain a teaching method that was very time consuming by lightening the workload so I could spend my time in curriculum development.
Opportunities for AI to Coach Students
I came to have so much confidence in the AI grading assistant that I built in access for my students. Students composing their summaries at InnovationAssessments can access the coach, which gives them a pretty accurate score estimate while they write. This take a little mystery out of “how am I doing?” and helps develop strong summarizing skills. That’s reading comprehension and basic composition.
The AI grading assistant is also an effective coach in short answer exercises. Enabling the coach for a practice run at short answer tasks permits students to have instant estimates of the quality of their work submissions and the AI offers little hints and suggestions drawn from the corpus of model answers on which it was trained.
We’re Not Being Replaced Yet
AI used in the way described here did not replace me. It still required supervision. I would assert that it enhanced my work, allowing me to use a better teaching methodology that was not very practical otherwise. The way I wanted to teach was really a recipe for burnout in the context of my particular teaching job. Assigning three summary tasks to a hundred students over a two week period, well, the reader can do the math. AI assisted scoring let me do the best job I could without burning myself out. That is a great reason to continue AI development and research, even for amateur programmers like myself.
There is a very solid reason why AI will not replace us. AI cannot replace teachers. One big reason is that AI cannot develop the kind of personal rapport with students that has always been the foundation youngsters need in order to learn. AI cannot form emotional bonds with people. If the day ever comes that it can do this, then we have something more than intelligence that is artificial, we will have a consciousness.