K-means-based feature learning for protein sequence classification

Paul Melman, Usman W. Roshan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Protein sequence classification has been a major challenge in bioinformatics and related fields for some time and remains so today. Due to the complexity and volume of protein data, algorithmic techniques such as sequence alignment are often unsuitable due to time and memory constraints. Heuristic methods based on machine learning are the dominant technique for classifying large sets of protein data. In recent years, unsupervised deep learning techniques have garnered significant attention in various domains of classification tasks, but especially for image data. In this study, we adapt a k-means-based deep learning approach that was originally developed for image classification to classify protein sequence data. We use this unsupervised learning method to preprocess the data and create new feature vectors to be classified by a traditional supervised learning algorithm such as SVM. We find the performance of this technique to be superior to that of the spectrum kernel and empirical kernel map, and comparable to that of slower distance matrix-based approaches.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Bioinformatics and Computational Biology, BICOB 2018
EditorsHisham Al-Mubaid, Oliver Eulenstein, Qin Ding
PublisherThe International Society for Computers and Their Applications (ISCA)
ISBN (Electronic)9781943436118
StatePublished - 2018
Event10th International Conference on Bioinformatics and Computational Biology, BICOB 2018 - Las Vegas, United States
Duration: Mar 19 2018Mar 21 2018

Publication series

NameProceedings of the 10th International Conference on Bioinformatics and Computational Biology, BICOB 2018
Volume2018-March

Other

Other10th International Conference on Bioinformatics and Computational Biology, BICOB 2018
Country/TerritoryUnited States
CityLas Vegas
Period3/19/183/21/18

All Science Journal Classification (ASJC) codes

  • Health Information Management
  • Biomedical Engineering
  • Computer Science Applications

Keywords

  • K-means
  • Protein classification
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'K-means-based feature learning for protein sequence classification'. Together they form a unique fingerprint.

Cite this