The STEM Writing Project is using classroom research and data science to better understand how students develop technical writing skills in large-enrollment biology courses. Our current NSF-sponsored grant asks if student development as writers can be accelerated using a mix of scripted active instruction, automated support of writing in progress, and holistic feedback.
One specific question we have is does the structure and main focus of instructors’ comments impact the rate of students’ writing development over time? To answer this we must extract instructor comments from student reports, categorize the comments according to their subject and structure, then correlate the types of comments instructors make with student performance.
Classifying TA comments has been informative, but doing it by hand is impractical. First, 10-12,000 comments must be scored EACH semester. Second, there is a significant risk of “coding drift” over time, reducing accuracy.
My project for the Faculty Learning Community for R was to create an automated comment classifier that can sort TA comments from student report into the same categories that we use for hand-coded data. This automated classifier will be used within the larger NSF-funded research project to:
The plan is to:
After completing the FLC project, my goals are to:
Looking further ahead, I hope to use the work presented here as a baseline for evaluating other potential text classification models that are described in the Background.
Copyright © 2019 A. Daniel Johnson. All rights reserved.