All Notebooks | Help | Support | About
17th June 2017 @ 05:20

In order to support Github issue #504, I used my previously described Matched Molecular Pairs (MMP), coupled with ECFP regression, to create and prioritize benzylic amine substitutes. MMP approach has been described here

In this case, the lead compound had the smiles '

FC(F)OC1=CC=C(C=C1)C1=NN=C2C=NC=C(COC(=O)C3=CC=CC=C3)N12

'

When fed to the MMP program, this generated 285,618 compounds. I then filtered out all the compounds which had the amine at the desired spot, which then reduced the pool to 47 compounds. All of the compounds can be found here

These compounds then had their EC50 predicted after training a MLP on the 125 compounds taken from the Master List which were designated 'series 4' and had appropriate EC50 values. 

11th May 2017 @ 04:18

The final test set of 400 compounds for the PfATP4 competition have been released here. As such, I appeneded all of the compounds in the "MASTER SHEET" portion of the spreadsheet to 'modelB.csv' (described in my earlier notebook entry) and labled them as test compounds (e.g. set "C") and ran the prediction regression. 

 

Results 

The potency predictions are reported below in rank order, with most potent at the top. 

ID log(ec50) EC50
MMV006239 -0.663722 0.2169093434
MMV407834 -0.49227 0.3219067377
MMV085230 -0.435151 0.3671546635
MMV047015 -0.396195 0.4016103405
MMV687807 -0.380073 0.4167995081
MMV006372 -0.301316 0.499671116
MMV687246 -0.288966 0.5140839317
MMV019189 -0.27593 0.529748656
MMV000063 -0.272942 0.5334058628
MMV001561 -0.271227 0.5355171519
MMV671636 -0.264702 0.5436227464
MMV676449 -0.25232 0.5593447438
MMV687273 -0.24481 0.5691022634
MMV688955 -0.223547 0.5976590019
MMV019087 -0.222331 0.5993346888
MMV688270 -0.221657 0.6002654537
MMV006833 -0.219929 0.6026576757
MMV689480 -0.217988 0.605357519
MMV000016 -0.217851 0.6055482434
MMV020165 -0.210743 0.6155404568
MMV688943 -0.208393 0.6188800963
MMV1028806 -0.207916 0.6195610793
MMV675968 -0.20761 0.619997785
MMV676536 -0.202214 0.6277493197
MMV676461 -0.199174 0.632157808
MMV688283 -0.197475 0.6346367574
MMV000023 -0.193034 0.6411587736
MMV020388 -0.192813 0.6414856069
MMV688703 -0.191242 0.6438099622
MMV020321 -0.190099 0.6455067979
MMV676468 -0.188977 0.6471765513
MMV676445 -0.188355 0.6481037796
MMV676411 -0.182692 0.6566104648
MMV676053 -0.181413 0.6585470994
MMV688362 -0.174245 0.6695075011
MMV024443 -0.170308 0.6756033085
MMV688761 -0.167301 0.680298304
MMV689437 -0.163373 0.6864779221
MMV675996 -0.162806 0.687375888
MMV008439 -0.162436 0.6879617217
MMV688762 -0.156979 0.6966609016
MMV676603 -0.154259 0.7010378002
MMV676159 -0.152876 0.7032725277
MMV688550 -0.146764 0.7132399518
MMV000907 -0.146491 0.7136896462
MMV676409 -0.136271 0.730682719
MMV024114 -0.132993 0.7362195234
MMV032995 -0.132344 0.7373206459
MMV676191 -0.129365 0.7423955253
MMV026550 -0.128507 0.7438631953
MMV676602 -0.128404 0.7440393237
MMV676472 -0.126018 0.7481385646
MMV024829 -0.124164 0.7513383045
MMV001493 -0.122058 0.7549905428
MMV676584 -0.118846 0.7605952056
MMV676492 -0.11828 0.7615883065
MMV003152 -0.117239 0.7634152352
MMV688754 -0.112376 0.772011279
MMV676588 -0.111996 0.7726875648
MMV688768 -0.110856 0.7747182245
MMV676558 -0.110553 0.775258815
MMV688273 -0.107241 0.7811934629
MMV020081 -0.105644 0.7840715431
MMV676401 -0.104831 0.7855404718
MMV002529 -0.103341 0.7882405192
MMV090930 -0.101495 0.7915985859
MMV010764 -0.100271 0.7938322624
MMV687801 -0.100033 0.7942683837
MMV687248 -0.0997557 0.79477522
MMV676526 -0.0967732 0.8002519904
MMV030734 -0.0960928 0.8015067145
MMV652003 -0.0945526 0.8043543797
MMV007133 -0.0935901 0.8061389205
MMV688994 -0.0930928 0.8070625741
MMV020591 -0.0906478 0.8116189729
MMV007638 -0.0896169 0.8135477869
MMV687254 -0.0882549 0.8161031824
MMV676260 -0.0874965 0.8175297027
MMV690027 -0.085387 0.821510348
MMV007803 -0.0840281 0.8240847457
MMV099637 -0.0836896 0.8247272702
MMV676555 -0.0829908 0.826055443
MMV202458 -0.0826616 0.8266817618
MMV020623 -0.0793795 0.8329530512
MMV020537 -0.0780813 0.8354466893
MMV676512 -0.0766954 0.8381169375
MMV687796 -0.0752439 0.8409228085
MMV023949 -0.0749996 0.8413959009
MMV676470 -0.0746362 0.8421003322
MMV676379 -0.0746194 0.8421328958
MMV676063 -0.0741979 0.8429505758
MMV689758 -0.0721812 0.8468740829
MMV687812 -0.0719042 0.8474144151
MMV020517 -0.0697416 0.8516446365
MMV689243 -0.0697075 0.8517114089
MMV687188 -0.069515 0.8520890276
MMV016136 -0.0687134 0.8536632261
MMV023183 -0.067957 0.8551514243
MMV690028 -0.066878 0.8572786079
MMV689709 -0.0628856 0.8651957112
MMV688469 -0.0619042 0.8671530945
MMV461553 -0.0615702 0.8678202508
MMV1030799 -0.0606314 0.8696983435
MMV687703 -0.0605446 0.8698720617
MMV676358 -0.0605285 0.8699044008
MMV553002 -0.0586204 0.8737347909
MMV688509 -0.0581968 0.8745873713
MMV675969 -0.0579905 0.8750029177
MMV687747 -0.055981 0.8790610591
MMV688330 -0.055632 0.8797676076
MMV676383 -0.0554833 0.880069035
MMV595321 -0.054179 0.8827159541
MMV688467 -0.0529625 0.8851920237
MMV687189 -0.0529473 0.8852230721
MMV659004 -0.0516271 0.8879180063
MMV687813 -0.0508994 0.8894070984
MMV690102 -0.0507498 0.8897135075
MMV676589 -0.0474766 0.8964443992
MMV689061 -0.0470264 0.8973742902
MMV676386 -0.0466438 0.8981651482
MMV676050 -0.0464202 0.898627786
MMV687170 -0.0458459 0.8998168137
MMV011229 -0.0453809 0.9007806689
MMV023985 -0.0452078 0.9011397854
MMV688766 -0.0450147 0.9015406258
MMV688756 -0.0434292 0.9048380058
MMV019551 -0.0423489 0.9070914066
MMV054312 -0.0416885 0.9084719457
MMV661713 -0.0415967 0.9086639779
MMV688773 -0.0403678 0.9112388928
MMV021375 -0.0383425 0.9154981127
MMV676380 -0.0363614 0.9196839106
MMV659010 -0.0352781 0.9219809108
MMV689000 -0.0346916 0.9232268687
MMV676539 -0.0346761 0.9232596866
MMV676186 -0.0338281 0.9250643223
MMV637229 -0.0336294 0.9254875855
MMV688122 -0.0335506 0.9256554951
MMV688941 -0.0332023 0.9263981194
MMV024035 -0.0325354 0.9278219061
MMV007920 -0.0324473 0.9280101398
MMV102872 -0.030723 0.9317019606
MMV690103 -0.0304792 0.9322250607
MMV637953 -0.0292842 0.9347938468
MMV676600 -0.0292179 0.9349365303
MMV688889 -0.0272594 0.9391621175
MMV019721 -0.026997 0.9397298343
MMV688364 -0.0263608 0.9411074646
MMV688774 -0.0262946 0.9412508939
MMV020120 -0.0258985 0.9421097871
MMV676877 -0.0258348 0.9422480307
MMV687146 -0.0258299 0.9422585621
MMV000858 -0.0237598 0.9467605442
MMV020291 -0.0218939 0.9508370428
MMV688179 -0.0217178 0.9512227367
MMV687243 -0.021089 0.9526008975
MMV676008 -0.0202795 0.9543780894
MMV688555 -0.0182815 0.9587788952
MMV021057 -0.0176341 0.9602093456
MMV023388 -0.0165055 0.9627078162
MMV688958 -0.0155807 0.9647599404
MMV002817 -0.0149127 0.9662450312
MMV495543 -0.014285 0.9676426908
MMV011903 -0.0142118 0.9678057918
MMV676509 -0.0135656 0.9692468125
MMV022029 -0.0131284 0.9702230061
MMV026468 -0.0121352 0.9724443908
MMV676431 -0.0121099 0.9725010725
MMV032967 -0.0116126 0.973615252
MMV026490 -0.0110064 0.9749752385
MMV688921 -0.0104071 0.9763215351
MMV188296 -0.0099718 0.9773006689
MMV676382 -0.00994214 0.9773674299
MMV676182 -0.00988629 0.9774931237
MMV002816 -0.00954815 0.9782544902
MMV611037 -0.00923053 0.9789701852
MMV009054 -0.00867309 0.9802275613
MMV006741 -0.00855426 0.9804958152
MMV062221 -0.00788165 0.9820155065
MMV021013 -0.0063922 0.9853892055
MMV676474 -0.00582961 0.9866665119
MMV024397 -0.00558698 0.9872179058
MMV688991 -0.00545839 0.9875102486
MMV146306 -0.00221443 0.9949140556
MMV020512 -0.00165371 0.9961994402
MMV688125 -0.000316227 0.9992721258
MMV688942 -0.000250081 0.9994243339
MMV676604 0.000105577 1.0002431286
MMV688410 0.000415908 1.0009581227
MMV011511 0.000596916 1.0013753956
MMV006901 0.000977656 1.0022536716
MMV001625 0.0015283 1.0035252407
MMV688844 0.00188613 1.0043524264
MMV688978 0.00223464 1.0051587084
MMV658988 0.00489315 1.0113306095
MMV161996 0.0053967 1.0125038923
MMV228911 0.00540872 1.0125319236
MMV688755 0.00701175 1.0162761768
MMV676597 0.00757768 1.0176013587
MMV688793 0.00851536 1.0198008183
MMV1088520 0.00970522 1.0225986647
MMV687762 0.0101459 1.0236367803
MMV019807 0.0109656 1.0255706945
MMV688891 0.0109853 1.0256171621
MMV023953 0.0112511 1.0262451024
MMV687776 0.0117397 1.0274002202
MMV676480 0.0119228 1.0278335856
MMV688553 0.0120318 1.0280915461
MMV687138 0.0125716 1.0293701526
MMV688352 0.0138485 1.0324012258
MMV689244 0.0148509 1.0347868863
MMV688853 0.0165983 1.038958687
MMV687696 0.0176987 1.0415944893
MMV688466 0.0182415 1.0428972513
MMV688470 0.0189316 1.0445557173
MMV687803 0.0189788 1.0446691659
MMV676599 0.0192072 1.0452187876
MMV023370 0.0193342 1.0455244901
MMV392832 0.0195071 1.0459408096
MMV687706 0.0198025 1.0466525747
MMV688990 0.0199586 1.0470286129
MMV676439 0.020446 1.0482044986
MMV676477 0.0216716 1.0511667377
MMV1110498 0.0219526 1.051847042
MMV688543 0.0224976 1.0531679212
MMV676881 0.0233318 1.055192836
MMV688411 0.0233322 1.0551936687
MMV001499 0.0240083 1.0568377479
MMV020152 0.0242473 1.057419377
MMV688372 0.0245463 1.0581476304
MMV688415 0.0248699 1.05893657
MMV019234 0.0254288 1.060299994
MMV688274 0.0278572 1.0662454203
MMV688776 0.0295673 1.0704521583
MMV200748 0.0308534 1.0736269296
MMV024195 0.0315951 1.0754620834
MMV023860 0.031614 1.0755087818
MMV688888 0.0323296 1.0772825716
MMV688854 0.0329199 1.0787477024
MMV393995 0.0341972 1.0819250631
MMV676389 0.0352057 1.084440446
MMV676520 0.0354993 1.0851738677
MMV688371 0.0362157 1.0869653319
MMV004168 0.0370056 1.0889441803
MMV010545 0.0377588 1.0908344425
MMV687729 0.0381533 1.0918256447
MMV676048 0.0385234 1.0927564809
MMV687700 0.0390438 1.0940667735
MMV688313 0.0393761 1.0949041701
MMV023227 0.0428561 1.103712789
MMV011765 0.0433114 1.1048706182
MMV688938 0.0436767 1.1058001691
MMV007471 0.0443609 1.1075437318
MMV688552 0.0445292 1.1079729619
MMV687730 0.0486716 1.1185917851
MMV009135 0.0497424 1.1213532481
MMV687775 0.050991 1.1245816076
MMV020320 0.051655 1.1263023996
MMV676269 0.0520492 1.1273250373
MMV676270 0.0520492 1.1273250373
MMV676478 0.0525201 1.1285482549
MMV016838 0.0555302 1.1363972149
MMV687794 0.0560505 1.137759465
MMV688980 0.0565772 1.1391403695
MMV676605 0.0591517 1.1459132637
MMV676554 0.0591881 1.1460092028
MMV675993 0.0601665 1.1485938474
MMV020670 0.0602955 1.1489351066
MMV676395 0.0603884 1.1491808461
MMV085071 0.0613554 1.1517426298
MMV676204 0.0614497 1.1519927445
MMV688472 0.0619778 1.1533941691
MMV676406 0.0643935 1.1598276772
MMV676571 0.0645061 1.1601283879
MMV676057 0.0648739 1.1611115395
MMV688939 0.0669817 1.16676044
MMV021660 0.0680149 1.1695394283
MMV688557 0.0690728 1.1723918693
MMV688471 0.0697636 1.1742582986
MMV688797 0.0703808 1.1759281438
MMV667494 0.0705631 1.1764218789
MMV023969 0.0710687 1.1777921811
MMV676377 0.072505 1.1816938287
MMV031011 0.0746406 1.1875190955
MMV1019989 0.0748595 1.1881177933
MMV688554 0.0757791 1.1906363367
MMV026356 0.0797433 1.2015540684
MMV688775 0.081523 1.2064879641
MMV688178 0.0830692 1.2107910798
MMV687749 0.0832295 1.2112379438
MMV688417 0.0834606 1.2118827996
MMV676384 0.0871109 1.2221117559
MMV028694 0.0879489 1.2244719795
MMV045105 0.0906374 1.2320757455
MMV019742 0.0942934 1.242491469
MMV688845 0.0961384 1.2477809493
MMV688360 0.0967192 1.2494507821
MMV560185 0.0982449 1.2538480402
MMV688934 0.0983802 1.2542385813
MMV688180 0.0990334 1.2561265868
MMV676442 0.101919 1.2645006661
MMV407539 0.102138 1.2651389561
MMV687765 0.102332 1.2657043035
MMV1198433 0.103118 1.2679958468
MMV688936 0.107439 1.2806751687
MMV012074 0.108203 1.2829299437
MMV676161 0.10882 1.2847545009
MMV688474 0.108861 1.2848757306
MMV020289 0.11331 1.2981051344
MMV687800 0.113602 1.2989792051
MMV676350 0.115747 1.305410724
MMV022236 0.11576 1.3054499608
MMV011691 0.1169 1.3088798023
MMV676476 0.117598 1.3109853116
MMV024937 0.117606 1.3110105014
MMV668727 0.119508 1.3167636884
MMV688796 0.120316 1.3192166998
MMV001059 0.12376 1.3297202285
MMV688846 0.128128 1.3431606573
MMV1236379 0.132416 1.3564887263
MMV687172 0.132714 1.3574184103
MMV676528 0.134326 1.3624677135
MMV676162 0.135047 1.3647296158
MMV687180 0.137385 1.3720963009
MMV676444 0.137788 1.3733726208
MMV658993 0.139632 1.3792141953
MMV085210 0.141658 1.3856654726
MMV020982 0.145807 1.3989658633
MMV663250 0.146525 1.4012810498
MMV676412 0.147783 1.4053449477
MMV634140 0.149923 1.4122867271
MMV688407 0.151387 1.4170572406
MMV084864 0.153324 1.4233907106
MMV689029 0.159039 1.4422432957
MMV688416 0.161281 1.4497111077
MMV676064 0.16211 1.4524795638
MMV688771 0.164229 1.4595843039
MMV007625 0.165056 1.462365192
MMV022478 0.165915 1.4652625321
MMV003270 0.167121 1.4693347665
MMV675995 0.169207 1.4764098946
MMV676524 0.173317 1.4904497653
MMV688262 0.175494 1.497938405
MMV000062 0.17945 1.5116442209
MMV019790 0.179731 1.5126230475
MMV688361 0.180063 1.5137815853
MMV676501 0.181709 1.5195304549
MMV010576 0.183937 1.5273445275
MMV153413 0.186822 1.537525017
MMV084603 0.187522 1.5400028242
MMV020710 0.187704 1.5406499249
MMV689255 0.193864 1.5626597822
MMV689028 0.194443 1.5647424626
MMV053220 0.195067 1.5669916258
MMV393144 0.195694 1.5692554904
MMV687699 0.197089 1.5743043252
MMV688124 0.205609 1.6054945837
MMV024101 0.214607 1.6391056252
MMV688795 0.217925 1.6516765852
MMV689060 0.220148 1.6601527221
MMV688763 0.23176 1.7051412189
MMV688852 0.234564 1.7161835629
MMV688508 0.235226 1.7188022594
MMV024311 0.238548 1.7320000147
MMV202553 0.239915 1.7374592434
MMV688271 0.241341 1.7431746654
MMV063404 0.244633 1.7564398718
MMV688327 0.245387 1.7594919571
MMV676388 0.246194 1.7627640457
MMV688279 0.247378 1.7675752111
MMV687239 0.254092 1.7951129067
MMV688798 0.254456 1.7966182413
MMV1029203 0.258445 1.8131970119
MMV687251 0.260949 1.8236802205
MMV024406 0.262146 1.8287142416
MMV688345 0.268814 1.8570074236
MMV675997 0.280822 1.9090717332
MMV069458 0.28164 1.9126719984
MMV688548 0.283081 1.9190272605
MMV688704 0.284786 1.926576084
MMV676398 0.287079 1.9367746667
MMV000011 0.292684 1.961931364
MMV272144 0.304905 2.0179253063
MMV020136 0.31137 2.0481890734
MMV675994 0.380393 2.401006777
MMV019993 0.384626 2.4245201201
MMV020391 0.414966 2.599953971
MMV687798 0.432491 2.7070188048
MMV085499 0.486487 3.0654018729
MMV026020 0.492975 3.1115391146
MMV1037162 0.499992 3.1622175509
MMV020520 0.525354 3.352386993
MMV688514 0.545498 3.5115412552
MMV688350 0.559441 3.6261119565
MMV675998 0.572442 3.7362996319
MMV688547 0.574383 3.7530390222
MMV026313 0.593654 3.9233216724
MMV023233 0.621736 4.1853922795
MMV019838 0.62528 4.2196883833
MMV687145 0.713423 5.16919333

 

 


A link to the fulll csv can be found here. Most interesting for me is the prevalence of  spiroindolones near the top of the potency chart, which have been previously characterized in literature as potent anti-malarial compounds against PfATP4 [1,2].

 

References 

[1] https://www.ncbi.nlm.nih.gov/pubmed/20813948

[2] https://www.nature.com/articles/srep27806

13th February 2017 @ 22:37

Goal 

Generate new compound ideas that are "lead like" around Series 4 frontrunner compound using matched molecular pair (mmpa) as seen in literature (1) cominbed with a previously made MLP potency predictor. 

Approach 

Using the rdkit code and approach detailed in (1), I was able to create a library of 'common' transformations as seen in the CHEMBL database of compounds (chembl_22_1, found here: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_22_1/), which amounts to ~1.67 million compounds. After all of the SMIRKS/SMARTS patterns were extracted for all of the transformations, I was able to enumerate a new library of compounds based off of (what seems to be) a lead compound for series 4: 

MMV669844: O=C(NC1=CC=NC(C(F)(F)F)=C1)C2=CN=CC3=NN=C(C4=CC=C(OC(F)F)C=C4)N32

Next, this library was then applied to my MLP potency prediction algorithm (described here: http://malaria.ourexperiment.org/in_silico_pfatp4_mo); here, I used 'modelB.csv' as my model, with the notable exception that all compounds in test set 'B' were labeled 'A', and thus included in on the training. The new compounds potencies were then predicted; all compounds with sub micromolar potency (e.g. pIC50 < 0) were then selected, and filtered for compounds that have the following properties:

logP < 5

200 < molecular weight < 500

# Hydrogen Bond Donors < 5

# Hydrogen Bond Acceptors < 10


Results 

Here are the compounds created:

SMILES Cmpd ID Predicted pEC50 Crippen LogP
OC(COc1cncc2nnc(-c3ccc(-c4ccccc4)cc3)n12)c1ccc(F)c(F)c1 END2 -0.510295 4.8489
N#Cc1ccc(-c2nnc3cncc(OCCc4ccccc4Cl)n23)cc1 END1 -0.470482 3.93788
CC(O)(COc1cncc2nnc(-c3ccc(C#N)cc3)n12)c1ccc(F)c(F)c1 END0 -0.415304 3.22768
N#Cc1ccc(-c2nnc3cncc(OCCc4ccc(F)c(F)c4)n23)cc1 END3 -0.374209 3.56268
COC1=CC(=O)C=C(c2ccc(F)c(F)c2)C1=O END5 -0.35541 2.0303
CC(O)(c1ccc(Cl)cc1)c1nnc2cncc(OCC(O)c3ccc(F)c(F)c3)n12 END6 -0.339275 3.4241
N#Cc1ccc(-c2nnc3cncc(OCCc4ccc(OF)c(Cl)c4)n23)cc1 END9 -0.314145 4.20118
N#Cc1ccc(-c2nnc3cncc(OCC(C(=O)O)C(O)Cc4ccc(F)c(F)c4)n23)cc1 END7 -0.309288 2.62438
CC(c1ccc(CO)cc1)c1ccc(F)c(F)c1 END8 -0.282801 3.6089
CC[Si](CC)(CC)c1nnc2cncc(OCC(O)c3ccc(F)c(F)c3)n12 END12 -0.28257 3.2304
CC1OOC(c2ccc(-c3nnc4cncc(OCC(O)c5ccc(F)c(F)c5)n34)cc2)C2OC12 END19 -0.277422 3.3407
N#Cc1ccc(-c2nnc3cncc(OCCc4ccccc4)n23)cc1 END10 -0.246898 3.28448
N#Cc1ccc(-c2nnc3cncc(OCCc4cccc(Cl)c4)n23)cc1 END14 -0.227092 3.93788
Cn1nc(C=Cc2ccc(F)c(F)c2)ccc1=O END11 -0.22469 2.2289
OC(CNCc1nnn[nH]1)c1ccc(F)c(F)c1 END20 -0.217257 0.3011
N#Cc1ccc(-c2nnc3cncc(OCC(O)c4ccc(F)c(F)c4)n23)cc1 END15 -0.212341 3.05358
O=C(CO)NC1CCCCC1c1ccc(F)c(F)c1 END21 -0.202371 2.0995
N#Cc1ccc(-c2nnc3cncc(OCC(F)c4ccc(F)c(O)c4)n23)cc1 END13 -0.199242 3.59728
Fc1ccc(C=NOCc2ccccc2)cc1F END23 -0.194933 3.5155
OCC[N+]1(CC(O)c2ccc(F)c(F)c2)CC1 END18 -0.183792 0.8209
OC(Cc1ccccc1)c1ccc(F)c(F)c1 END22 -0.1786 3.2409
CC(F)(F)CC(O)c1ccc(F)c(F)c1 END16 -0.168665 3.0435
N#Cc1ccc(-c2nnc3cncc(OCCc4ccc(F)cc4)n23)cc1 END28 -0.158248 3.42358
CCCNC(N)=NCCCC(O)c1ccc(F)c(F)c1 END24 -0.147586 2.0927
N#Cc1ccc(-c2nc3nc(O)c2CC=NC=C(OCC(O)c2ccc(F)c(F)c2)S3)cc1 END17 -0.142704 4.26718
N#Cc1ccc(-c2nnc3cncc(OCC(O)(F)Cc4cccc(Cl)c4)n23)cc1 END25 -0.138543 3.59598
N#Cc1ccc(-c2nnc3cncc(OCC(F)c4cccc(Cl)c4)n23)cc1 END27 -0.133425 4.40598
COc1ncccc1CC(O)COc1cncc2nnc(-c3ccc(C#N)cc3)n12 END29 -0.112781 2.04898
Oc1c(F)cc(F)c(Br)c1Cl END37 -0.111889 3.0863
N#Cc1ccc(-c2nnc3cncc(OCOc4ccc5c(F)csc5c4)n23)cc1 END33 -0.110806 4.43208
FOc1ccc(Br)cc1Cl END36 -0.108331 3.3658
N#Cc1ccc(-c2nnc3cncc(OCC(O)c4cc(F)cc(C#N)c4)n23)cc1 END30 -0.0900954 2.78616
N#Cc1ccc(-c2nnc3cncc(OCC(O)c4cc(F)cc(Cl)c4)n23)cc1 END38 -0.0889663 3.56788
OCC(Cc1ccccc1)Nc1ccc(F)c(F)c1 END34 -0.0844837 2.9803
N#Cc1ccc(-c2nnc3cncc(OCCc4ccncc4)n23)cc1 END39 -0.0818176 2.67948
N#Cc1ccc(-c2nnc3cncc(OCCCOO)n23)cc1 END32 -0.0715722 1.92148
CN(C(=O)c1ccc(F)c(F)c1)c1cccnc1 END26 -0.0694662 2.6364
N#Cc1ccc(-c2nnc3cnc(OCC(O)c4ccc(F)c(F)c4)cn23)cc1 END35 -0.0591146 3.05358
N#Cc1ccc(-c2nnc3cncc(OCC(=O)c4ccc(Cl)cc4Cl)n23)cc1 END40 -0.0585438 4.23148
N#Cc1ccc(-c2nnc3cncc(OCC(O)c4ccc(Cl)c(F)c4)n23)cc1 END31 -0.054634 3.56788

 

 

currently, the model seems to have a potency accuracy of ~0.6, meaning that the difference in potency needs to be greater than 0.65 in order to comfortably say that one compound is more potent than the other. The compounds IDs here were generated internaly. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1) 

Hussain, J., & Rea, C. (2010). "Computationally efficient algorithm to identify matched molecular pairs (MMPs) 
in large data sets." Journal of chemical information and modeling, 50(3), 339-348.[/code]

 

 

 

 

5th February 2017 @ 19:27

Goal 

Predict Pfal (whole cell) pEC50 against Pfal

Data 

I will use the data found here; specificaly, I will train 2 separate models. In the first model (from here on referred to as "Model A"), I will only train the model on compounds which meet the following criterion: 

  • Are not labeled "B" or "C" in the "Ion regulation Test Set" column,  and only labeled "A", "Lit", or "M" 
  • Do not have an empty or "ND" value in the "Potency vs. Parasite (uMol)" column 
  • Do not have a potency value modifier (e.g. '>' or '<') 
  • Do not belong to multiple groups in "Ion regulation Test Set" 

This filter results in a total of 333 "Training" compounds and 19 "Testing" compounds. For a second model (from here on referred to as "Model B"), I will use the following criterion: 

  • Are not labeled "B" or "C" in the "Ion regulation Test Set" column 
  • Do not have an empty or "ND" value in "Potency vs. Parasite (uMol)" column 
  • Do not belong to multiple groups in "Ion regulation Test Set" 

This results in  565 training compounds and 34 test compounds. 

Approach 

I will take all of the compounds with relevant data, convert the SMILES strings to ECFP4 (e.g. circular fingerprints with radius/depth 4), convert to a  1024-bit vector, and use these as input for a sequential neural network (as implemented in Keras). The properties of the neural network are: 

  • # Layers: 4 (1 input, 1 droupout,  1 hidden, 1 output)
  • Dimensions of layers: 1024, 1024 (with 20% droupout rate), 1024,1 
  • Activation: Relu 
  • initilization:  normal 
  • optimizer: Adam 
  • batch size: 45
  • # Epochs: 1000
  • loss: mean absolute error 

The model will then be evaluated using B and C "Test Set" molecules. 

Results 

For Model A, the overall loss (MAE) for the training averages 0.012 (over 5 runs), and the average loss across the predictions was 0.78. Predictions for the test set are as follows: 

OSM Title SMILES Predicted pEC50
OSM-S-175 O=C(NC1=CC=NC(C(F)(F)F)=C1)C2=CN=CC3=NN=C(C4=CC=C(OC(F)F)C=C4)N32 -0.6018518806
OSM-S-379 FC(C(Cl)=C1)=CC=C1NC(C2=CN=CC3=NN=C(C4=CC=C(OC(F)F)C=C4)N32)=O -0.5310548544
OSM-S-201 O=C(NC1=C(C)C(Cl)=CC=C1)C2=CN=CC(N23)=NN=C3C4=CC=C(OC(F)F)C=C4 -0.4488587379
OSM-S-366 C(COc1cncc2n1c(nn2)c1cnc(cc1)C(F)(F)F)c1cc(c(cc1)F)F -0.0910440534
OSM-S-218 FC1=C(F)C=CC(C(OC)COC2=CN=CC3=NN=C(C4=CC=C(C#N)C=C4)N32)=C1 -0.0042779259
OSM-S-376 OC(C1=CC(F)=C(F)C=C1)COC2=CN=CC3=NN=C(C4=CC=C(C#N)C=C4)N32 0.007357562
OSM-S-272 FC1=C(F)C=CC(CCOC2=CN=CC3=NN=C(C4=CC=C(Cl)C=C4)N32)=C1 0.0366517603
OSM-S-386 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OC(C)C4=CC=C(C#N)C=C4)N32 0.0693697035
OSM-S-385 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OC(C)C4=CC=C(F)C(F)=C4)N32 0.1233436689
OSM-S-293 ClC(C=C1)=CC=C1C2=NN=C3N2C(OCCC4=CC=CC=C4)=CN=C3 0.1889874637
OSM-S-390 FC(OC1=CC=C(C2=NN=C3C=NC=C(N32)OCC(C4=CC(F)=C(C=C4)F)O)C=C1)F 0.3146250844
OSM-S-371 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4(C56)C7C6C8C5C4C87)N32 0.3437955081
OSM-S-353 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCC(C4=CC=CC=C4)CO)N32 0.3452663124
OSM-S-383 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4=CC(OC)=CC=C4)N32 0.3493134975
OSM-S-279 OC(C1=CC=CC=C1)COC2=CN=CC3=NN=C(C4=CC=C(OC(F)F)C=C4)N32 0.36657691
OSM-S-389 FC1=C(C=CC(C(COC2=CN=CC3=NN=C(N32)C4=CC=C(OC(F)F)C=C4)N(C)C)=C1)F 0.3937544823
OSM-S-384 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4=CC=C(OC)C=C4)N32 0.4351124465
OSM-S-369 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4=CC=CC=C4)N32 0.4595601559
OSM-S-370 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCCC45C6C7C4C8C5C6C87)N32 0.6245583296

 

Here, an average loss of 0.78 on the test set represents a ~6-fold potency window to which a compounds pEC50 value is accurate, meaning that a separation of >~0.78 in pEC50 values is needed between two compounds to reliably dictate rank-order potency. 

For Model B, the overal (MAE) loss was 0.05 on the training set, and had an average loss of 0.61 on the test set. Predictions for the test set are as follows: 

OSM Title SMILES Predicted pEC50
OSM-S-218 FC1=C(F)C=CC(C(OC)COC2=CN=CC3=NN=C(C4=CC=C(C#N)C=C4)N32)=C1 -0.693947196
OSM-S-377 FC(C=C1)=C(F)C=C1[C@@H](OC(F)F)COC2=CN=CC3=NN=C(C4=CC=C(C#N)C=C4)N32 -0.6369754076
OSM-S-272 FC1=C(F)C=CC(CCOC2=CN=CC3=NN=C(C4=CC=C(Cl)C=C4)N32)=C1 -0.4158473015
OSM-S-378 FC1=C(F)C=CC(C(N(C)C)COC2=CN=CC3=NN=C(C4=CC=C(C#N)C=C4)N32)=C1 -0.391158253
OSM-S-376 OC(C1=CC(F)=C(F)C=C1)COC2=CN=CC3=NN=C(C4=CC=C(C#N)C=C4)N32 -0.3737068176
OSM-S-381 c1ncc2n(c1OCC(c1cc(c(cc1)F)F)CO)c(nn2)c1ccc(cc1)OC(F)F -0.3351149261
OSM-S-389 FC1=C(C=CC(C(COC2=CN=CC3=NN=C(N32)C4=CC=C(OC(F)F)C=C4)N(C)C)=C1)F -0.3097887337
OSM-S-390 FC(OC1=CC=C(C2=NN=C3C=NC=C(N32)OCC(C4=CC(F)=C(C=C4)F)O)C=C1)F -0.2679155469
OSM-S-366 C(COc1cncc2n1c(nn2)c1cnc(cc1)C(F)(F)F)c1cc(c(cc1)F)F -0.152574122
OSM-S-384 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4=CC=C(OC)C=C4)N32 -0.1496760994
OSM-S-369 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4=CC=CC=C4)N32 -0.1322912425
OSM-S-385 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OC(C)C4=CC=C(F)C(F)=C4)N32 -0.1301248074
OSM-S-375 IC12C(C3C1C4C52)C5C34C6=NN=C7C=NC=C(OCCC8=CC(F)=C(F)C=C8)N76 -0.117139101
OSM-S-383 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4=CC(OC)=CC=C4)N32 -0.1054382771
OSM-S-379 FC(C(Cl)=C1)=CC=C1NC(C2=CN=CC3=NN=C(C4=CC=C(OC(F)F)C=C4)N32)=O -0.0771714821
OSM-S-293 ClC(C=C1)=CC=C1C2=NN=C3N2C(OCCC4=CC=CC=C4)=CN=C3 -0.0625027046
OSM-S-371 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCC4(C56)C7C6C8C5C4C87)N32 -0.0088994242
OSM-S-254 O=C(NC1=CC(Cl)=CC=C1C)C2=CN=CC(N23)=NN=C3C4=CC=C(OC(F)F)C=C4 0.0126997996
OSM-S-368 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCC4=CC=CC=C4)N32 0.0295363646
OSM-S-353 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCC(C4=CC=CC=C4)CO)N32 0.0616748333
OSM-S-372 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCC4=CN(CC5=CC=CC=C5)N=N4)N32 0.0935347378
OSM-S-370 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCCCC45C6C7C4C8C5C6C87)N32 0.0995715931
OSM-S-387 FC(C=C1)=C(F)C=C1CCOC2=CN=CC3=NN=CN32 0.1199974492
OSM-S-386 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OC(C)C4=CC=C(C#N)C=C4)N32 0.1889441311
OSM-S-382 IC12C3C4C5(C3C1C5C24)C6=NN=C7C=NC=C(C(NC8=CC(C(F)(F)F)=NC=C8)=O)N76 0.2778000832
OSM-S-374 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCC4=CN(C5=NC(Cl)=CN=C5)N=N4)N32 0.3043126762
OSM-S-363 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(SCCC4=CC=CC=C4)N32 0.3521158695
OSM-S-388 OC1=CN=CC2=NN=C(N3CCCCC3)N21 0.3899840713
OSM-S-364 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(S(CCC4=CC=CC=C4)=O)N32 0.4322550595
OSM-S-373 FC(F)OC(C=C1)=CC=C1C2=NN=C3C=NC=C(OCC(N=N4)=CN4C5=CC=CC=C5)N32 0.4649647772
OSM-S-279 OC(C1=CC=CC=C1)COC2=CN=CC3=NN=C(C4=CC=C(OC(F)F)C=C4)N32 0.4917371869
OSM-S-204 O=C(NC1=C(F)C(Cl)=CC=C1)C2=CN=CC(N23)=NN=C3C4=CC=C(OC(F)F)C=C4 0.5700372458
OSM-S-278 OC(C1=CC=CC=C1)COC2=CN=CC3=NN=C(C4=CN=C(C(F)(F)F)C=C4)N32 0.6374928355
OSM-S-201 O=C(NC1=C(C)C(Cl)=CC=C1)C2=CN=CC(N23)=NN=C3C4=CC=C(OC(F)F)C=C4 0.8498254418

 

 The script and data used to run this are available on my Github (here


11th December 2016 @ 18:06

GOAL 

Predict Pfal (whole cell) pEC50 against Pfal using neural networks 

DATA

I will be using the data entries from here; Briefly, I will train on SMILES patterns which have a value (e.g. not empty or 'ND') in the "Potency vs. Parasite (uMol)" column (which are converted to pEC50 by log10(potency)), and are not labeled "B" or "C" in the "Ion Regulation Test Set" column. Then, I will validate the method on compounds labeld "B" or "C" on "Ion Regulation Test Set". In total, there are 564 training compounds, and 

 

APPROACH

I will take all of the compounds with relevant data, convert the SMILES strings to ECFP4 (e.g. circular fingerprints with radius/depth 4), convert to a  1024-bit vector, and use these as input for a sequential neural network (as implemented in Keras). The properties of the neural network are: 

  • # Layers: 3 (1 input, 1 hidden, 1 output)
  • Dimensions of layers: 1024,1024,1 
  • Activation: Relu 
  • initilization:  normal 
  • optimizer: Adam 
  • batch size: 50
  • # Epochs: 600
  • loss: mean absolute error 

The model will then be evaluated using B and C "Test Set" molecules. 

RESULTS 

The average results of the 5-fold validation (on test set only) are: 

  • Loss: 0.045
  • 'Score': 0.43

Here, 'Loss' is the Mean Absolute Error (MAE) on the training set (80%), and the 'score' is the MAE on the test split (20%) of the data. To clarify, a loss of 0.43 pEC50 is ~2.7 fold window in potency accuracy. I'm quite happy with that accuracy, so I will now check the predictions on the B and C test set, in two stages. First, some of the B  compounds have reported EC50 values, so I will check the performance on those compounds, and then will predict on the remaining unlabled compounds.

The regression on the "B" test compounds resulted in these predictions. The average error across the set was 0.89, which is a ~7.8 fold potency window for the predictions. This is obviously worse than the performance we saw in the test set, but still (in my opinion) a useful prediction method, especially if the model is intended to seperate "Inactive" compunds (say at the 500 microM level) from those that are "active" (say <1 microM level). 

Happy with those predictions, I then moved on to make predictions on the "C" compounds, which can be seen here. All of these "C" class predictions seem to be rougly equipotent with the "B" class compounds; given the high degree of similarity between these compounds and the training compounds, I feel quite confident in their predictions.