Data-driven methodology has become a key tool in computationally predicting material properties. Currently, these techniques are priced high due to computational requirements for generating sufficient training data for high-precision machine learning models. In this study, we present a support vector regression (SVR)-based machine learning model to predict the stability of silicon (Si)–alkaline metal alloys, with a strong emphasis on the transferability of the model to new silicon alloys with different electronic configurations and structures. We elaborate on the role of the structural descriptor in imparting transferability to the model that is trained on limited data (~ 750 Si alloys) derived from the Material Project database. Three popular descriptors, namely X-ray diffraction (XRD), sine coulomb matrix (SCM), and orbital field matrix (OFM), are evaluated for representing Si alloys. The material structures are represented by descriptors in the SVR model, coupled with hyperparameter tuning techniques like Grid Search CV and Bayesian optimization, to find the best performing model for predicting total energy, formation energy and packing fraction of the Si alloy systems. The models are trained on Si alloys with lithium (Li), sodium (Na), potassium (K), magnesium (Mg), calcium (Ca), and aluminum (Al) metals, where Si–Na and Si–Al systems are used as test structures. Our results show that XRD, an experimentally derived characterization of structures, performs most reliably as a descriptor for total energy prediction of new Si alloys. The study demonstrates that by qualitatively selection of training data, using hyperparameter tuning methods, and employing appropriate structural descriptors, the data requirements for robust and accurate ML models can be reduced.
All Science Journal Classification (ASJC) codes
- Ceramics and Composites
- Materials Science (miscellaneous)
- General Materials Science
- Mechanics of Materials
- Mechanical Engineering
- Polymers and Plastics