An ensemble deep learning model for drug abuse detection in sparse twitter-sphere

Han Hu, Nhat Hai Phan, James Geller, Stephen Iezzi, Huy Vo, Dejing Dou, Soon Ae Chun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

As the problem of drug abuse intensifies in the U.S., many studies that primarily utilize social media data, such as postings on Twitter, to study drug abuse-related activities use machine learning as a powerful tool for text classification and filtering. However, given the wide range of topics of Twitter users, tweets related to drug abuse are rare in most of the datasets. This imbalanced data remains a major issue in building effective tweet classifiers, and is especially obvious for studies that include abuse-related slang terms. In this study, we approach this problem by designing an ensemble deep learning model that leverages both word-level and character-level features to classify abuse-related tweets. Experiments are reported on a Twitter dataset, where we can configure the percentages of the two classes (abuse vs. non abuse) to simulate the data imbalance with different amplitudes. Results show that our ensemble deep learning models exhibit better performance than ensembles of traditional machine learning models, especially on heavily imbalanced datasets.

Original languageEnglish (US)
Title of host publicationMEDINFO 2019
Subtitle of host publicationHealth and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics
EditorsBrigitte Seroussi, Lucila Ohno-Machado, Lucila Ohno-Machado, Brigitte Seroussi
PublisherIOS Press
Pages163-167
Number of pages5
ISBN (Electronic)9781643680026
DOIs
StatePublished - Aug 21 2019
Event17th World Congress on Medical and Health Informatics, MEDINFO 2019 - Lyon, France
Duration: Aug 25 2019Aug 30 2019

Publication series

NameStudies in Health Technology and Informatics
Volume264
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Conference

Conference17th World Congress on Medical and Health Informatics, MEDINFO 2019
Country/TerritoryFrance
CityLyon
Period8/25/198/30/19

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Keywords

  • Machine Learning
  • Social Media
  • Substance-Related Disorders

Fingerprint

Dive into the research topics of 'An ensemble deep learning model for drug abuse detection in sparse twitter-sphere'. Together they form a unique fingerprint.

Cite this