Improving synthetic efficiency using the computational prediction of biological activity

K. C. Broglé, T. Gund, D. J. Kyle

Research output: Contribution to journalArticlepeer-review


A process has been developed whereby libraries of compounds for lead optimization can be synthesized and screened with greater efficiency using computational tools. In this method, analogues of a lead chemical structure are considered in the form of a virtual library. Less than 1/3 of the library is selected as a training set by clustering the compounds and choosing the centroid of each cluster. This training set is then used to generate a model using PLS regression upon the experimental values from that assay using 1D/2D descriptors. The model is applied to the remaining compounds (the test set) for which assay values are predicted and a rank ordering established. An example of this was a set of 169 PDE4 inhibitors. A predictive model was achieved using a training set of 52 compounds. When applied to the remaining 117 compounds this model allowed a rank ordering of these compounds for synthesis and testing. Selecting the top 33 compounds of the test set gives 78% of the compounds with the desired activity (hits) by synthesizing only 50% of the library, including the training set. Selecting the top 59 of the test set gives 97% of the hits from only 67% of the library. This process succeeds by avoiding two principal weaknesses of 2D descriptors: lack of interpretation and lack of extrapolation. Two principal assumptions of QSAR are shown to be unnecessary; removing descriptor redundancy does not improve fit and a predictive r2 greater than 0.5 is not necessary if rank-ordering is desired.

Original languageEnglish (US)
Pages (from-to)103-113
Number of pages11
JournalCombinatorial Chemistry and High Throughput Screening
Issue number2
StatePublished - Feb 2006

All Science Journal Classification (ASJC) codes

  • Drug Discovery
  • Computer Science Applications
  • Organic Chemistry


Dive into the research topics of 'Improving synthetic efficiency using the computational prediction of biological activity'. Together they form a unique fingerprint.

Cite this