Early Prediction of Students’ Academic Achievement: Categorical Data from Fully Online Learning on Machine-Learning Classification Algorithms

Tuti Purwoningsih, Harry B. Santoso, Kristanti A. Puspitasari, Zainal A. Hasibuan


Several challenges related to predicting students' academic achievement in fully online learning are defining the dataset used as a predictor. Accordingly, in this study, we define the dataset as categorical data from student demographic profile data, activities, and learning habits of Fully Online Learning students at the Universitas Terbuka (UT). This study's main objective is to predict early academic achievement of fully online learning students using category data as features and to identify relevant important features/predictors. We apply several machine learning (ML) classification algorithms to make early predictions of student academic achievement. This study uses 75,136,349 UT-LMS log data, combined with the demographic profile of 101,617 undergraduate students in fully online learning. Datasets were converted into categorical data to minimize noise arising from large datasets. This study found that the influence factors to student's academic achievement are online learning activities related to access day, study time, and student profession profile. Most students were accessing the UT-LMS on Monday, and the time was in the evening. The evaluations and experiments showed that the random forest algorithm could achieve 85.03% accuracy for the balancing dataset with SMOTE, encoding ordinal data with a label encoder and nominal data with a one-hot encoder. The findings can assist lecturers in designing instructional strategies to improve the student's academic achievement success. Furthermore, the principal novel contribution of this study is how to explore the UT-LMS log data and student demographic data to define it as a categorical data set in the machine-learning classification algorithms. The process of categorizing datasets in this study is more of an art than a science, but this research can form the basis for similar research with other scientific principles analysis. So that similar research after this produces a more optimal accuracy.

Keywords: learning management system, fully online learning, academic achievement, machine learning.

Full Text:



VIMBI P. M. The Good, the Bad, and the Ugly of Distance Learning in Higher Education. Trends in E-learning. 2018: 17–29. [Online]. Available: http://dx.doi.org/10.5772/intechopen.75702.

JAVIER. B-A, SONIA J. R., and SONIA P. Early prediction of undergraduate Student’s academic performance in completely online learning: A five-year study. Computers in Human Behavior, 2021, 115(02): 106595. http://dx.doi.org/10.1016/j.chb.2020.106595.

TRAVIS T. Y., CHARLES G, and SUSAN R. Defining and measuring academic success. Practical Assessment, Research and Evaluation, 2015, 20(5): 1–20. https://doi.org/10.7275/hz5x-tx03.

RANNVEIG. G., TOVE I. D., TORE S., and ODDGEIR F. Relationships between learning approach, procrastination and academic achievement amongst first-year university students. Higher Education, 2017, 74(5): 757–774, https://doi.org/10.1007/s10734-016-0075-z.

PENGFEI WU, SHENGQUAN YU and DAN WANG. Using a Learner-Topic Model for Mining Learner Interests in Open Learning Environments. Journal of Educational Technology & Society, 2018, 21(2): 192–204, [Online]. Available: http://www.jstor.org/stable/26388396.

JUAN L. R, JUAN A. G., and ARTURO D. Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences (Switzerland), 2020. 10(3): 1-16, https://doi.org/10.3390/app10031042.

RIANNE C., CHRIS S., AD K., and UWE M. Predicting student performance from LMS data: A Comparison of 17 Blended Courses Using Moodle LMS. Institute of Electrical and Electronics Engineers Transactions on Learning Technologies, 2017, 10(1): 17–29. https://doi.org/10.1109/TLT.2016.2616312.

DRAGAN G., SHANE D., TIM R., and DANIJELA G. Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. Internet and Higher Education, 2016, 28: 68–84. https://doi.org/10.1016/j.iheduc.2015.10.002.

JACLYN B. Comparing online and blended learner’s self-regulated learning strategies and academic performance. The Internet and Higher Education, 2017, 33: 24-32.https://doi.org/10.1016/j.iheduc.2017.01.004.

MARÍA L. S, ÁNGEL F., and GUSTAVO A. Technology behaviors in education innovation. Computers in Human Behavior, 2017, 72: 596–598. https://doi.org/10.1016/j.chb.2016.11.049.

ANALÍA C., EDUARDO V., ABELARDO P., VIKTORIA P., ANGELA F., CARLA B., and STEFANIE L. Finding traces of self-Regulated learning in activity streams. LAK’18: International Conference on Learning Analytics and Knowledge, 2018: 191–200. https://dl.acm.org/doi/pdf/10.1145/3170358.3170381.

LU T. H. OWEN, HUANG Q. Y. ANNA, HUANG H. C. JEFF, LIN Q. J. ALBERT, OGATA HIROAKI, and YANG H. J. STEPHEN. Applying learning analytics for the early prediction of students’ academic performance in blended learning.Educational Technology and Society, 2018, 21(2): 220–232. https://www.jstor.org/stable/26388400.

KRISTANTI A. P., and BOEDHI O. Successful Students in an Open and Distance Learning System. Turkish Online Journal of Distance Education, 2018, 19(2): 189–200. https://dergipark.org.tr/tr/download/article-file/458690.

SANYAM B., SAI S., and ABHAY B. Application of learning analytics using clustering data Mining for Students’ disposition analysis. Education and Information Technologies, 2017, 23: 957–984. https://doi.org/10.1007/s10639-017-9645-7.

SADIQ H., MEHMET A., JOSAN DT, and ALEEZA S. Big Data and Learning Analytics Model. International Journal of Computer Sciences and Engineering, 2018, 6(7): 654–3. https://doi.org/10.26438/ijcse/v6i7.654663.

ILYA M., STANISLAV P., and KSENIA T. Predictors of academic achievement in blended learning: The case of data science minor. International Journal of Emerging Technologies in Learning, 2019, 14(5): 64–74. https://doi.org/10.3991/ijet.v14i05.9512.

PRATYA N., WONGPANYA N. DIREK T., KANAKARN P., and SITTICHAI B. Prediction Model of Student Achievement in Business Computer Disciplines. International Journal of Emerging Technologies in Learning, 2020, 15(20): 160–181. https://doi.org/10.3991/ijet.v15i20.15273.

SAGARDEEP R., and SHAILENDRA N. S. Emerging trends in applications of big data in educational data mining and learning analytics. The 7th International Conference Confluence 2017 on Cloud Computing, Data Science and Engineering, 2017: 193-198. https://doi.org/10.1109/CONFLUENCE.2017.7943148.

MOHAMED E. Predict Network, Application Performance Using Machine Learning and Predictive Analytics. Department of Electrical, Computer & Telecom Engineering Technology, Rochester Institute of Technology, 2019: 1-49. https://www.proquest.com/dissertations-theses/predict-network-application-performance-using/docview/2231085357/se-2?accountid=17242.

RADHIKA R. H. Application of Machine Learning algorithms for betterment in education system, in International Conference on Automatic Control and Dynamic Optimization Techniques. International Conference on Automatic Control and Dynamic Optimization Techniques, ICACDOT, 2017: 1110–1114. https://doi.org/10.1109/ICACDOT.2016.7877759.

AHMED A. M., CAO Han, and ZHANG Weizhen. Prediction of Students’ Early Dropout Based on Their Interaction Logs in Online Learning Environment. Interactive Learning Environments, 2019: 1–20. https://doi.org/10.1080/10494820.2020.1727529.

IVAN L., EDWIN A., ALEJANDRO M., HIRAM G., VICTOR M., and SAUL G. A memory-efficient encoding method for processing mixed-type data on machine learning. Entropy, 2020, 22(12): 1391. https://doi.org/10.3390/e22121391.

EVANDRO BC, BALDOINO F., MARCELO A. S., FABRÍSIA F., ARAÚJO, and JOILSON R. Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses. Computers in Human Behavior, 2017, 73: 247–256. https://doi.org/10.1016/j.chb.2017.01.047.

LIANG JIAJUN, YANG JIAN, WU YONGJI, LI CHAO, and ZHENG Li. Big data application in education: Dropout prediction in edx MOOCs. 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), 2016: 440-443. https://doi.org/10.1109/BigMM.2016.70.

CHARLES E. D, and CHANG LIU. Data Analytics Education: A Longitudinal View. International Journal of Information, Business and Management, 2019, 11(4): 8–19. https://www.intechopen.com/chapters/60465.

LIU LEI, NI YIZHAO , ZHANG NANHUA , and PRATAP J. Mining patient-specific and contextual data with machine learning technologies to predict cancellation of children’s surgery. International Journal of Medical Informatics, 2019, 129: 234-241. https://doi.org/10.1016/j.ijmedinf.2019.06.007.

IQBAL H. S., KAYES A. S. M., and PAUL W. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. Journal of Big Data, 2019, 6(1) 57. https://doi.org/10.1186/s40537-019-0219-y.

NAGESWARI S, and PALLAVI M. G. Comparison of classification techniques on data mining.International Journal of Emerging Technology and Innovative Engineering, 2019, 5(5): 267–272. https://ssrn.com/abstract=3375191.

CÉDRIC B. and JEFFREY S. R. Predicting University Students’ Academic Success and Major Using Random Forests. Research in Higher Education, 2019, 60(7): 1048–1064. https://doi.org/10.1007/s11162-019-09546-y.

Yuji R., Geon H., Steven E. W. A Survey on Data Collection for Machine Learning: A Big Data-AI Integration Perspective. Institute of Electrical and Electronics Engineers Transactions on Knowledge and Data Engineering, 2021, 33(4): 1328–1347. https://doi.org/10.1109/TKDE.2019.2946162.


  • There are currently no refbacks.