24th October 2013 @ 12:28

How do we best view a collection of molecular structures and their biological activities?

This is a central challenge we have in OSM. It would be good if the casual medicinal chemist could simply browse the project's structures.

Contributors are making molecules every day and regularly but less frequently receiving potencies or other bio/chemdata. We need to be able to share the structures and the activities most effectively. We want the data to be easily shared but also easily browsed. So we need a sheet/something with:

1) Structures, i.e. 2D pictures of the molecules that are human-friendly
2) Associated informatics data (e.g. InChI) that are machine-friendly
3) Potencies or other data
4) Any associated ID numbers
5) A weblink or two to where the molecule is featured/made

It's useful I think for the project to have a discrete place where the data are kept, just to maintain identity - i.e. not just to be subsumed by a larger database. Or at least for the project's structures to be group-able if they are part of a larger database. But really this is a problem about human visualization.

The initial solution was a Google sheet, but we found that beyond about 50 structures the sheet didn't handle the images well.

An alternative is a shared Excel sheet, but as I understand it we would need a plugin to handle the chemical structures. That's do-able, but not if we expect all the readers to have the same plugin.
The current solution is an sd file - a succinct and easy-to-update text file that contains all the information. HOWEVER, reading the data (i.e. browsing the structures) is not easy to do for the casual observer.

So what is needed? Well, we're batch-uploading data to Chembl. If we could do an auto-upload to Chembl (daily) then this would be problem solved, since Chembl is very cool and are doing cool things with visualization.

But another possible solution is for us to be able to set up a system where: the sd file is displayed on a webpage with a static address (can be bookmarked). When the sd file is updated, that would lead to a new rendering of the webpage when it is loaded. The page would need to have the structures, and be displayed in an active way such that the data can be re-ordered on demand, like in a spreadsheet.

The main sd file has now been joined by an Excel sheet and sd file of lots of exciting new molecules for the latest series the project's looking at:

We coincidentally need to combine these two sd files, and we need to browse the new structures because we need to think about which molecules to make next in that new series.

I know that there are solutions that are appropriate for cheminformaticians. We need solutions for people who are happy with email and web browsers only.

Any ideas?

Egon Willighagen spoke about possible solutions using the sd file during the previous OSM project meeting (


