All Notebooks | Help | Support | About
Templates
Ion Regulation Assay Classification
Archives
Authors
Sections
Tools
Show/Hide Keys
31st March 2017 @ 11:01

I developed a gradient boosting model (using xgboost) to predict actives and nonactives for the PfATP4 ion regulation assay, I sampled the data to include only those in the vicinity of OSM S4 compounds.

A network of compounds was useful to analyse this data set. I calculated a matrix of Tanimoto similarity based on ECFP4 fingerprints and used a threshold of 0.28, following the approach described in (Zahoránszky-Kőhalmi et al 2016)(1). Here is how it looks like, where nodes are colored by origin:

OSM Series 4 Competition, compounds network by origin

 

OSM Series 4 Competition, nodes colored by Ion Regulation Assay column

I selected 156 compounds in the neighbourhood of OSM Series 4 based on the network above and performed 2 rounds of 10-fold cross validation using the R package caret and grid search for parameters. The results of the best model are shown below:  

The final values used for the Gradient Boosting Model were nrounds = 50, max_depth = 1, eta = 0.3, gamma = 0, colsample_bytree = 0.6 and min_child_weight = 1.

Confusion Matrix and Statistics:

 

                      Reference
Prediction Inactive Partial Active
  Inactive       12       1      0
  Partial         0       1      0
  Active          1       2     18[/code]

Overall Statistics

               Accuracy : 0.8857         
                 95% CI : (0.7326, 0.968)
    No Information Rate : 0.5143         
    P-Value [Acc > NIR] : 3.724e-06      
                                         
                  Kappa : 0.7923         
 Mcnemar's Test P-Value : 0.2615 [/code]

  Statistics by Class:

                     Class: Inactive Class: Partial Class: Active
Sensitivity                   0.9231        0.25000        1.0000
Specificity                   0.9545        1.00000        0.8235
Pos Pred Value                0.9231        1.00000        0.8571
Neg Pred Value                0.9545        0.91176        1.0000
Prevalence                    0.3714        0.11429        0.5143
Detection Rate                0.3429        0.02857        0.5143
Detection Prevalence          0.3714        0.02857        0.6000
Balanced Accuracy             0.9388        0.62500        0.9118[/code]

 

 

[1] -  Zahoránszky-Kőhalmi, et al. 2016.
“Impact of Similarity Threshold on the Topology of
Molecular Similarity Networks and Clustering Outcomes.” 
Journal of Cheminformatics 8 (1): 16. doi:10.1186/s13321-016-0127-5.