Examining Engineering Students' Academic Performance Using Machine Learning Algorithms as a Data Analysis Tool

Research output: Contribution to journalArticlepeer-review

Abstract

As the demand for engineers continues to grow, understanding the factors that influence the academic performance of engineering students has become increasingly important. While much of the existing research has focused on predicting common indicators such as grade point average (GPA), the time it takes students to complete their academic programs (known as time-to-degree (TTD)) has received comparatively less attention. Furthermore, recent advancements in artificial intelligence and machine learning have provided new data analysis tools for performing predictive analysis on large educational datasets. This study leverages a range of machine learning algorithms, including multiple linear regression, binary logistic regression, decision trees, random forest, XGBoost, and LightGBM, to analyze GPA and TTD data from records of 7,871 undergraduate engineering students at a public research university in the United States. First, we evaluate the performance of these algorithms in two tasks: predicting GPA (regression task) and classifying TTD (classification task). Second, we examine how variables related to students’ academic background (such as high school GPA, SAT scores, and major), demographic background (sex and underrepresented status), and socioeconomic background (eligibility for educational opportunity programs) contribute to predicting GPA and classifying TTD. The results indicate that multiple linear regression and binary logistic regression outperform single decision-tree methods. However, ensemble methods that combine multiple decision trees, such as random forest and LightGBM, provide better performance than regression-based models, particularly in predicting GPA. Moreover, the variable importance analysis using the SHapley Additive exPlanations (SHAP) method reveals that students’ background characteristics differentially predict GPA and TTD, with academic background variables holding the highest importance. The findings highlight the potential of machine learning techniques in examining educational datasets and offer insights for future research on leveraging machine learning as a data analysis tool in engineering education research.

Original languageEnglish (US)
Pages (from-to)1515-1531
Number of pages17
JournalInternational Journal of Engineering Education
Volume41
Issue number6
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Education
  • General Engineering

Keywords

  • academic performance
  • engineering education
  • machine learning

Fingerprint

Dive into the research topics of 'Examining Engineering Students' Academic Performance Using Machine Learning Algorithms as a Data Analysis Tool'. Together they form a unique fingerprint.

Cite this