Towards Designing Profitable Courses: Predicting Student Purchasing Behaviour in MOOCs

Publication Information


  • Mohammad Alshehri, Department of Management Information Systems, College of Business, University of Jeddah, Jeddah, Saudi Arabia
  • Ahmed Alamri, Department of Management Information Systems, College of Business, University of Jeddah, Jeddah, Saudi Arabia
  • Alexandra I. Cristea, Durham University
  • Craig D. Stewart, Durham University


  • 215-233


  • Machine learning, MOOCs, Purchasing prediction, Learner analytics


  • Since their ‘official’ emergence in 2012 (Gardner and Brooks 2018), massive open online courses (MOOCs) have been growing rapidly. They offer low-cost education for both students and content providers; however, currently there is a very low level of course purchasing (less than 1% of the total number of enrolled students on a given online course opt to purchase its certificate). The most recent literature on MOOCs focuses on identifying factors that contribute to student success, completion level and engagement. One of the MOOC platforms’ ultimate targets is to become self-sustaining, enabling partners to create revenues and offset operating costs. Nevertheless, analysing learners’ purchasing behaviour on MOOCs remains limited. Thus, this study aims to predict students purchasing behaviour and therefore a MOOCs revenue, based on the rich array of activity clickstream and demographic data from learners. Specifically, we compare how several machine learning algorithms, namely RandomForest, GradientBoosting, AdaBoost and XGBoost can predict course purchasability using a large-scale data collection of 23 runs spread over 5 courses delivered by The University of Warwick between 2013 and 2017 via FutureLearn. We further identify the common representative predictive attributes that influence a learner’s certificate purchasing decisions. Our proposed model achieved promising accuracies, between 0.82 and 0.91, using only the time spent on each step. We further reached higher accuracy of 0.83 to 0.95, adding learner demographics (e.g. gender, age group, level of education, and country) which showed a considerable impact on the model’s performance. The outcomes of this study are expected to help design future courses and predict the profitability of future runs; it may also help determine what personalisation features could be provided to increase MOOC revenue.