TY - JOUR
T1 - Examining Engineering Students' Academic Performance Using Machine Learning Algorithms as a Data Analysis Tool
AU - Rodriguez-Hernandez, Carlos Felipe
AU - Gazula, Vinay Ram
AU - Shekhar, Prateek
N1 - Publisher Copyright:
© 2025 TEMPUS Publications.
PY - 2025
Y1 - 2025
N2 - As the demand for engineers continues to grow, understanding the factors that influence the academic performance of engineering students has become increasingly important. While much of the existing research has focused on predicting common indicators such as grade point average (GPA), the time it takes students to complete their academic programs (known as time-to-degree (TTD)) has received comparatively less attention. Furthermore, recent advancements in artificial intelligence and machine learning have provided new data analysis tools for performing predictive analysis on large educational datasets. This study leverages a range of machine learning algorithms, including multiple linear regression, binary logistic regression, decision trees, random forest, XGBoost, and LightGBM, to analyze GPA and TTD data from records of 7,871 undergraduate engineering students at a public research university in the United States. First, we evaluate the performance of these algorithms in two tasks: predicting GPA (regression task) and classifying TTD (classification task). Second, we examine how variables related to students’ academic background (such as high school GPA, SAT scores, and major), demographic background (sex and underrepresented status), and socioeconomic background (eligibility for educational opportunity programs) contribute to predicting GPA and classifying TTD. The results indicate that multiple linear regression and binary logistic regression outperform single decision-tree methods. However, ensemble methods that combine multiple decision trees, such as random forest and LightGBM, provide better performance than regression-based models, particularly in predicting GPA. Moreover, the variable importance analysis using the SHapley Additive exPlanations (SHAP) method reveals that students’ background characteristics differentially predict GPA and TTD, with academic background variables holding the highest importance. The findings highlight the potential of machine learning techniques in examining educational datasets and offer insights for future research on leveraging machine learning as a data analysis tool in engineering education research.
AB - As the demand for engineers continues to grow, understanding the factors that influence the academic performance of engineering students has become increasingly important. While much of the existing research has focused on predicting common indicators such as grade point average (GPA), the time it takes students to complete their academic programs (known as time-to-degree (TTD)) has received comparatively less attention. Furthermore, recent advancements in artificial intelligence and machine learning have provided new data analysis tools for performing predictive analysis on large educational datasets. This study leverages a range of machine learning algorithms, including multiple linear regression, binary logistic regression, decision trees, random forest, XGBoost, and LightGBM, to analyze GPA and TTD data from records of 7,871 undergraduate engineering students at a public research university in the United States. First, we evaluate the performance of these algorithms in two tasks: predicting GPA (regression task) and classifying TTD (classification task). Second, we examine how variables related to students’ academic background (such as high school GPA, SAT scores, and major), demographic background (sex and underrepresented status), and socioeconomic background (eligibility for educational opportunity programs) contribute to predicting GPA and classifying TTD. The results indicate that multiple linear regression and binary logistic regression outperform single decision-tree methods. However, ensemble methods that combine multiple decision trees, such as random forest and LightGBM, provide better performance than regression-based models, particularly in predicting GPA. Moreover, the variable importance analysis using the SHapley Additive exPlanations (SHAP) method reveals that students’ background characteristics differentially predict GPA and TTD, with academic background variables holding the highest importance. The findings highlight the potential of machine learning techniques in examining educational datasets and offer insights for future research on leveraging machine learning as a data analysis tool in engineering education research.
KW - academic performance
KW - engineering education
KW - machine learning
UR - https://www.scopus.com/pages/publications/105021849605
UR - https://www.scopus.com/pages/publications/105021849605#tab=citedBy
M3 - Article
AN - SCOPUS:105021849605
SN - 0949-149X
VL - 41
SP - 1515
EP - 1531
JO - International Journal of Engineering Education
JF - International Journal of Engineering Education
IS - 6
ER -