Algebraic-datatype taint tracking, with applications to understanding Android identifier leaks

Sydur Rahaman, Iulian Neamtiu, Xin Yin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Current taint analyses track flow from sources to sinks, and report the results simply as source → sink pairs, or flows. This is imprecise and ineffective in many real-world scenarios; examples include taint sources that are mutually exclusive, or flows that combine sources (e.g., IMEI and MAC Address are concatenated, hashed, leaked vs. IMEI and MAC Address hashed separately and leaked separately). These shortcomings are particularly acute in the context of Android, where sensitive identifiers can be combined, processed, and then leaked, in complicated ways. To address these issues, we introduce a novel, algebraic-datatype taint analysis that generates rich yet concise taint signatures involving AND, XOR, hashing-akin to algebraic, product and sum, types. We implemented our approach as a static analysis for Android that derives app leak signatures-an algebraic representation of how, and where, hardware/software identifiers are manipulated before being exfiltrated to the network. We perform six empirical studies of algebraic-datatype taint tracking on 1,000 top apps from Google Play and their embedded libraries, including: discerning between "raw"and hashed flows which eliminates a source of imprecision in current analyses; finding apps and libraries that go against Google Play's guidelines by (ab)using hardware identifiers; showing that third-party code, rather than app code, is the predominant source of leaks; exposing potential de-anonymization practices; and quantifying how apps have become more privacy-friendly over the past two years.

Original languageEnglish (US)
Title of host publicationESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsDiomidis Spinellis
PublisherAssociation for Computing Machinery, Inc
Pages70-82
Number of pages13
ISBN (Electronic)9781450385626
DOIs
StatePublished - Aug 20 2021
Event29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece
Duration: Aug 23 2021Aug 28 2021

Publication series

NameESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/TerritoryGreece
CityVirtual, Online
Period8/23/218/28/21

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software

Keywords

  • android
  • fingerprinting
  • identifier leak
  • mobile security
  • static analysis
  • taint analysis

Fingerprint

Dive into the research topics of 'Algebraic-datatype taint tracking, with applications to understanding Android identifier leaks'. Together they form a unique fingerprint.

Cite this