TY - GEN
T1 - Algebraic-datatype taint tracking, with applications to understanding Android identifier leaks
AU - Rahaman, Sydur
AU - Neamtiu, Iulian
AU - Yin, Xin
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/8/20
Y1 - 2021/8/20
N2 - Current taint analyses track flow from sources to sinks, and report the results simply as source → sink pairs, or flows. This is imprecise and ineffective in many real-world scenarios; examples include taint sources that are mutually exclusive, or flows that combine sources (e.g., IMEI and MAC Address are concatenated, hashed, leaked vs. IMEI and MAC Address hashed separately and leaked separately). These shortcomings are particularly acute in the context of Android, where sensitive identifiers can be combined, processed, and then leaked, in complicated ways. To address these issues, we introduce a novel, algebraic-datatype taint analysis that generates rich yet concise taint signatures involving AND, XOR, hashing-akin to algebraic, product and sum, types. We implemented our approach as a static analysis for Android that derives app leak signatures-an algebraic representation of how, and where, hardware/software identifiers are manipulated before being exfiltrated to the network. We perform six empirical studies of algebraic-datatype taint tracking on 1,000 top apps from Google Play and their embedded libraries, including: discerning between "raw"and hashed flows which eliminates a source of imprecision in current analyses; finding apps and libraries that go against Google Play's guidelines by (ab)using hardware identifiers; showing that third-party code, rather than app code, is the predominant source of leaks; exposing potential de-anonymization practices; and quantifying how apps have become more privacy-friendly over the past two years.
AB - Current taint analyses track flow from sources to sinks, and report the results simply as source → sink pairs, or flows. This is imprecise and ineffective in many real-world scenarios; examples include taint sources that are mutually exclusive, or flows that combine sources (e.g., IMEI and MAC Address are concatenated, hashed, leaked vs. IMEI and MAC Address hashed separately and leaked separately). These shortcomings are particularly acute in the context of Android, where sensitive identifiers can be combined, processed, and then leaked, in complicated ways. To address these issues, we introduce a novel, algebraic-datatype taint analysis that generates rich yet concise taint signatures involving AND, XOR, hashing-akin to algebraic, product and sum, types. We implemented our approach as a static analysis for Android that derives app leak signatures-an algebraic representation of how, and where, hardware/software identifiers are manipulated before being exfiltrated to the network. We perform six empirical studies of algebraic-datatype taint tracking on 1,000 top apps from Google Play and their embedded libraries, including: discerning between "raw"and hashed flows which eliminates a source of imprecision in current analyses; finding apps and libraries that go against Google Play's guidelines by (ab)using hardware identifiers; showing that third-party code, rather than app code, is the predominant source of leaks; exposing potential de-anonymization practices; and quantifying how apps have become more privacy-friendly over the past two years.
KW - android
KW - fingerprinting
KW - identifier leak
KW - mobile security
KW - static analysis
KW - taint analysis
UR - http://www.scopus.com/inward/record.url?scp=85116251941&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116251941&partnerID=8YFLogxK
U2 - 10.1145/3468264.3468550
DO - 10.1145/3468264.3468550
M3 - Conference contribution
AN - SCOPUS:85116251941
T3 - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 70
EP - 82
BT - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Spinellis, Diomidis
PB - Association for Computing Machinery, Inc
T2 - 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Y2 - 23 August 2021 through 28 August 2021
ER -