TY - GEN
T1 - Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
AU - Du, Mengnan
AU - Mukherjee, Subhabrata
AU - Cheng, Yu
AU - Shokouhi, Milad
AU - Hu, Xia
AU - Awadallah, Ahmed Hassan
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Recent work has focused on compressing pretrained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate that our bias mitigation framework improves the OOD generalization of the compressed models, while not sacrificing the in-distribution task performance.
AB - Recent work has focused on compressing pretrained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate that our bias mitigation framework improves the OOD generalization of the compressed models, while not sacrificing the in-distribution task performance.
UR - http://www.scopus.com/inward/record.url?scp=85159850694&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159850694&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85159850694
T3 - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 1758
EP - 1770
BT - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
Y2 - 2 May 2023 through 6 May 2023
ER -