Title: Classification of molecular characteristics to identify disease targets by score function of violations
Authors: Shalem Leemaqz - Robinson Research Institute, University of Adelaide (Australia) [presenting]
Irene Hudson - Swinburne University of Technology (Australia)
Andrew Abell - University of Adelaide (Australia)
Shalem Leemaqz - South Australian Health and Medical Research Institute; University of Adelaide (Australia)
Abstract: Expanding disease modifying targets to pharmacological manipulation is vital to reducing morbidity and mortality, and is critical to drug discovery. Modelling disease targets would allow for prediction and prioritisation based on their molecular characteristics and druggability. Classification rules by support vector machine (SVM), Recursive partitioning (RP) and Random forest (RF) based on 8 molecular parameters were performed to classify disease targets with high ($\ge 4$) or low ($<4$) violator scores. Predictors includes the 4 traditional parameters of Lipinskis rule of five (Ro5), plus 4 extra parameters (polar surface area PSA, number of rotatable bonds and rings, N and O atoms, and distribution coefficient log D). A total of 1279 small molecules from the DrugBank database (Knox 2011), combining detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with drug disease target information, were analysed and these were shown to be aligned with 172 targets. For the validation set SVM gave an AUC of 93.2\% (95\% CI 85.8\%-99.9\%). The RF classification gave similar but slightly lower AUC (91.5\%; 95\%CI 83.4\%-99.5\%), followed by RP with AUC 87.5\% (95\%CI 78\%-97\%). Results illustrated that SVM used in combination with simple molecular descriptors of disease targets can provide a reliable assessment of violation scores, and hence druggability.