Poster: Identifying SMILES from Molecular Structure Images

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Accurate extraction of Simplified Molecular Input Line Entry System (SMILES) representations from molecular structure images is crucial for computational chemistry and cheminformatics. This study presents a machine learning-based approach for converting graphical molecular representations into SMILES notation, addressing the challenges of chemical image recognition. We developed a deep learning model leveraging Tensorflow/Keras framework, cheminformatics tools such as RDKit, and Long Short-Term Memory (LSTM) networks for sequence learning. Trained on a curated dataset of molecular images, the model effectively learns structure-text relationships, enabling high-accuracy SMILES predictions. Our approach enhances chemical data digitization, improves dataset accuracy, and accelerates molecular property assessments and drug discovery, driving progress in pharmaceutical research and personalized medicine. By training and testing an image-captioning model, we ensure robust SMILES generation while maintaining high fidelity to molecular structures.

Original languageEnglish (US)
Title of host publicationProceedings - 2025 IEEE/ACM International Conference on Connected Health
Subtitle of host publicationApplications, Systems and Engineering Technologies, CHASE 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages484-485
Number of pages2
ISBN (Electronic)9798400715396
DOIs
StatePublished - 2025
Externally publishedYes
Event10th IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2025 - Manhattan, United States
Duration: Jun 24 2025Jun 26 2025

Publication series

NameProceedings - 2025 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2025

Conference

Conference10th IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2025
Country/TerritoryUnited States
CityManhattan
Period6/24/256/26/25

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Information Systems and Management
  • Control and Optimization

Keywords

  • Bioinformatics
  • Data extraction
  • Image recognition
  • Neural networks

Fingerprint

Dive into the research topics of 'Poster: Identifying SMILES from Molecular Structure Images'. Together they form a unique fingerprint.

Cite this