- Giora Alexandron, Weizmann Institute of Science
- Lisa Y. Yoo, Massachusetts Institute of Technology
- José A. Reipérez-Valiente, Massachusetts Institute of Technology
- Sunbok Lee, University of Houston
- David E. Pritchard, Massachusetts Institute of Technology
- Learning Analytics, MOOCs, Replication research, Sensitivity analysis, Fake learners
- The rich data that Massive Open Online Courses (MOOCs) platforms collect on the behavior of millions of users provide a unique opportunity to study human learning and to develop data-driven methods that can address the needs of individual learners. This type of research falls into the emerging field of learning analytics. However, learning analytics research tends to ignore the issue of the reliability of results that are based on MOOCs data, which is typically noisy and generated by a largely anonymous crowd of learners. This paper provides evidence that learning analytics in MOOCs can be significantly biased by users who abuse the anonymity and open-nature of MOOCs, for example by setting up multiple accounts, due to their amount and aberrant behavior. We identify these users, denoted fake learners, using dedicated algorithms. The methodology for measuring the bias caused by fake learners’ activity combines the ideas of Replication Research and Sensitivity Analysis. We replicate two highly-cited learning analytics studies with and without fake learners data, and compare the results. While in one study, the results were relatively stable against fake learners, in the other, removing the fake learners’ data significantly changed the results. These findings raise concerns regarding the reliability of learning analytics in MOOCs, and highlight the need to develop more robust, generalizable and verifiable research methods.