These are the slides of my talk “Structure Generation, Metabolite Space, and Metabolite Likeness” at the Unilever Center for Molecular Informatics in November 2011, Cambridge, UK. I presented these slides during my UK road tour in November 2011.
Structure Generation, Metabolite Space, and Metabolite Likeness
The chemoinformatics tools presented here are developed for metabolomics, specifically for metabolite identification. From our mass spectrometry experiments, and after data analysis, we want to identify certain metabolites of interest.
Of these molecules we might know the elemental composition (which and of how many atoms they are composed) and maybe one or more fragments (maybe a ring, a chain, or a functional group).
With this information we have to propose candidate structures for our unknowns. If we don’t find them in a database, we should generate the chemical structures with a computer tool, the structure generator.
Chemical Structure Generation
In this part of the talk I describe structure generator I have developed. It relies on the canonical augmentation approach proposed by Brendan McKay and it makes use of the Chemistry Development Kit (CDK).
The main use of this tool is: for a given elemental composition and prescribed non-overlaping fragment(s), it exhaustively produces all non-duplicate chemical structures.
Since the output list can be very large, we want to keep only those molecules that are likely to be metabolites. Therefore, we have developed a model that predicts the percentage of Metabolite Likeness of a molecule, this is, how likely a molecule is to be a metabolite.
Metabolite Space and Metabolite Likeness
We have combined 3 classifiers and 5 molecular representations to build metabolite likeness models. We wanted to see which combination could discriminate better metabolites from non metabolites.
The best models have been validated with a prospective validation set to asses that it can classify well new and unseen molecules.
We expect to use this model to rank candidate structures for unknown metabolites. Also, we hope these tools help scientist working in metabolite identification.