TY - GEN
T1 - Identification of user application by an external eavesdropper using machine learning analysis on network traffic
AU - Fathi-Kazerooni, Sina
AU - Kaymak, Yagiz
AU - Rojas-Cessa, Roberto
PY - 2019/5
Y1 - 2019/5
N2 - An eavesdropper may infer the computer applications a person uses by collecting and analyzing the network traffic they generate. Such inference may be performed despite applying encryption on the generated packets. In this paper, we investigate the extent of the ability of several machine learning algorithms to perform this privacy breach on the network traffic generated by a user. We measure their accuracy in identifying different applications by analyzing several statistical properties of the generated traffic rather than looking into the encrypted content. We compare the performance of these algorithms and select the one with higher precision; random forest. We also evaluate the application of packet padding to modify the packet length to avoid identification by machine learning algorithms. We test the effect of packet padding on the identification ability of the various machine-learning algorithms. We investigate the performance of the random forest algorithm in detail when applied to intact and padded traffic. We show that padding may decrease the efficacy of a machine-learning algorithm when used for application classification.
AB - An eavesdropper may infer the computer applications a person uses by collecting and analyzing the network traffic they generate. Such inference may be performed despite applying encryption on the generated packets. In this paper, we investigate the extent of the ability of several machine learning algorithms to perform this privacy breach on the network traffic generated by a user. We measure their accuracy in identifying different applications by analyzing several statistical properties of the generated traffic rather than looking into the encrypted content. We compare the performance of these algorithms and select the one with higher precision; random forest. We also evaluate the application of packet padding to modify the packet length to avoid identification by machine learning algorithms. We test the effect of packet padding on the identification ability of the various machine-learning algorithms. We investigate the performance of the random forest algorithm in detail when applied to intact and padded traffic. We show that padding may decrease the efficacy of a machine-learning algorithm when used for application classification.
KW - Internet Traffic Classification
KW - Machine Learning
KW - Multi-layer perceptron
KW - Online activity tracking
KW - Random Forest
KW - Support-vector machines
UR - http://www.scopus.com/inward/record.url?scp=85070292415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070292415&partnerID=8YFLogxK
U2 - 10.1109/ICCW.2019.8756709
DO - 10.1109/ICCW.2019.8756709
M3 - Conference contribution
T3 - 2019 IEEE International Conference on Communications Workshops, ICC Workshops 2019 - Proceedings
BT - 2019 IEEE International Conference on Communications Workshops, ICC Workshops 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Communications Workshops, ICC Workshops 2019
Y2 - 20 May 2019 through 24 May 2019
ER -