All Notebooks | Help | Support | About
19th March 2012 @ 10:33
Target prediction run by Iain Wallace of ChEMBL for the compounds in the malaria box.

Malaria Box Prediction

Computationally predicted protein targets were calculated based on the compound structure information using a statistical model derived from the ChEMBL database. Specifically, we used a multi-category Naive Bayes statistical model that identifies compound structural features (specific sets of atoms and bonds generated using the ECFP_4 fingerprint) that are correlated with a particular data class (i.e. protein target). A set of active compounds was created for proteins in the ChEMBL database for which at least 50 compounds (to ensure a robust model) were annotated to an activity <10uM. A multi-category model was then built with PipelinePilot for each of these compounds (i.e. active against this protein) versus all of the other compounds (assumed inactive). By scoring each compound with all 1,287 models, a ranked list of up to the top 50 predicted protein targets (with a model score >0) for each compound is generated. The scores for each individual protein target were standardized by comparing to the scores obtained for a random set of >10,000 compounds.
Linked Posts
Attached Files
Malaria Box Prediction