Abstract
Traditional machine learning approaches to drug sensitivity prediction assume that training data and test data must be in the same feature space and have the same underlying distribution. However, in real-world applications, this assumption does not hold. For example, we sometimes have limited training data for the task of drug sensitivity prediction in multiple myeloma patients (target task), but we have sufficient auxiliary data for the task of drug sensitivity prediction in patients with another cancer type (related task), where the auxiliary data for the related task are in a different feature space or have a different distribution. In such cases, transfer learning, if applied correctly, would improve the performance of prediction algorithms on the test data of the target task via leveraging the auxiliary data from the related task. In this paper, we present two transfer learning approaches that combine the auxiliary data from the related task with the training data of the target task to improve the prediction performance on the test data of the target task. We evaluate the performance of our transfer learning approaches exploiting three auxiliary data sets and compare them against baseline approaches using the area under the receiver operating characteristic curve on the test data of the target task. Experimental results demonstrate the good performance of our approaches and their superiority over the baseline approaches when auxiliary data are incorporated.
Original language | English (US) |
---|---|
Article number | 7907277 |
Pages (from-to) | 7381-7393 |
Number of pages | 13 |
Journal | IEEE Access |
Volume | 5 |
DOIs | |
State | Published - 2017 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Materials Science
- General Engineering
Keywords
- Machine learning
- cancer drug discovery
- clinical informatics
- data mining
- precision medicine