From the BlogHire Me

MetiTree: a web application to organize and process high resolution multi-stage mass spectrometry metabolomics data

I am co-author of a new publication in the field of metabolomics and metabolite identification. Its title is “MetiTree: a web application to organize and process high resolution multi-stage mass spectrometry metabolomics data

MetiTree web application organize process high resolution multi stage mass spectrometry metabolomics data My Publications

Miguel,  Rojas-Cherto,; Michael,  van Vliet,; E,  Peironcely, Julio; Ronnie,  van Doorn,; Maarten,  Kooyman,; Te,  Beek, Tim; a,  van Driel, Marc; Thomas,  Hankemeier,; Theo,  Reijmers,

MetiTree: a web application to organize and process high resolution multi-stage mass spectrometry metabolomics data (Article)

Bioinformatics, Page(s): 2–4, 2012.



SUMMARY: Identification of metabolites using high resolution multistage mass spectrometry (MS(n)) data is a significant challenge demanding access to all sorts of computational infrastructures. MetiTree is a user-friendly, web application dedicated to organize, process, share, visualize, and compare MS(n) data. It integrates several features to export and visualize complex MS(n) data, facilitating the exploration and interpretation of metabolomics experiments.

A dedicated spectral tree viewer allows the simultaneous presentation of three related types of MS(n) data, namely, the spectral data, the fragmentation tree, and the fragmentation reactions. MetiTree stores the data in an internal database to enable searching for similar fragmentation trees and matching against other MS(n) data. As such MetiTree contains much functionality that will make the difficult task of identifying unknown metabolites much easier.

AVAILABILITY: MetiTree is accessible at The source code is available here.

In Simple Words

MetiTree improves metabolite identification by providing a place where to store, process, and compare MSn data. With MetiTree you can:

  • Process your raw MSn data files and determine the elemental composition of you unknown metabolite using the MEF tool.
  • Compare your MSn data to the other MSn data you stored in MetiTree and find similar trees using our fragmentation tree fingerprint method.
  • Visualize your MSn trees.

Metabolite Identification Using Automated Comparison of High-Resolution Multistage Mass Spectral Trees

I am co-author of a new publication in the field of metabolomics and metabolite identification. Its title is “Metabolite Identification Using Automated Comparison of High-Resolution Multistage Mass Spectral Trees

metabolite identification using automated comparison of high resolution multistage mass spectral trees My Publications

Miquel, Rojas-Cherto,; E, Peironcely, Julio; T, Kasper, Piotr; J, van der Hooft, Justin J; H, de Vos, Ric C; Rob, Vreeken,; Thomas, Hankemeier,; Theo, Reijmers,

Metabolite Identification Using Automated Comparison of High-Resolution Multistage Mass Spectral Trees (Article)

Analytical chemistry, 2012.


Multistage mass spectrometry (MS(n)) generating so-called spectral trees is a powerful tool in the annotation and structural elucidation of metabolites and is increasingly used in the area of accurate mass LC/MS-based metabolomics to identify unknown, but biologically relevant, compounds. As a consequence, there is a growing need for computational tools specifically designed for the processing and interpretation of MS(n) data.

Here, we present a novel approach to represent and calculate the similarity between high-resolution mass spectral fragmentation trees. This approach can be used to query multiple-stage mass spectra in MS spectral libraries. Additionally the method can be used to calculate structure-spectrum correlations and potentially deduce substructures from spectra of unknown compounds. The approach was tested using two different spectral libraries composed of either human or plant metabolites which currently contain 872 MS(n) spectra acquired from 549 metabolites using Orbitrap FTMS(n).

For validation purposes, for 282 of these 549 metabolites, 765 additional replicate MS(n) spectra acquired with the same instrument were used. Both the dereplication and de novo identification functionalities of the comparison approach are discussed. This novel MS(n) spectral processing and comparison approach increases the probability to assign the correct identity to an experimentally obtained fragmentation tree.

Ultimately, this tool may pave the way for constructing and populating large MS(n) spectral libraries that can be used for searching and matching experimental MS(n) spectra for annotation and structural elucidation of unknown metabolites detected in untargeted metabolomics studies.

 In Simple Words

Similar metabolites have similar mass spectral trees. Imagine you have a collection or database of mass spectral trees of known metabolites, i.e. for which you know the chemical structure.

Now you have a mass spectral tree of an unknown metabolite. We propose a method to use the similarity between trees to identify the unknown either:

  • By finding in the database a tree that is 100% similar. Here we could assign the identity to the unknown because we had it in the database.
  • We might find several similar metabolites (not 100%) in the database. If these metabolites belong to a sub-class, let’s say amino acids, we assign the class of the unknown, it should also be a amino acid.
  • These similar metabolites can have a common piece (a ring or a scaffold) which we speculate that is also present in the structure of the unknown metabolite. This piece, the Maximum Common Substructure, can help in metabolite identification and be used in Computer Assisted Structure Elucidation process to propose candidate structures for the unknown that contain such piece.


One Difference Between Web And Biotech Startups

Difference Web Startup Biotech Startup

Photo by 401K

The difference between web and biotech startups is risk.

For a web startup it is easy to build a product that works. Hire some talented programmers and designers and you will have a working version of your idea. The risk lies in getting people to like and choose this product.

For a biotech startup the risky part involves going from an idea to a finished product. A possible treatment for a disease should work well and be safe. But hiring reputed scientists to lead the development phase is no guarantee to prevent the complexity of biology from threatening your startup. Once the product is finished, adoption by customers is easy.

Metabolite Retention Time Prediction, Help Needed

Metabolite Retention Time Prediction This is a call for all the experts in HPLC metabolite retention time prediction, or QSRR. Would you build a model using the following data or just give up? Please share your two cents.

Would you build a metabolite retention time prediction model with these data?

I got the a dataset from some HPLC experiments with the mission: can you build a Retention Time (RT) prediction model with these data?

I shared a Google Spreadsheet with the Retention Time of the metabolites.

We are doing metabolite identification and the plan would be to use such a model to reject candidate structures for unknown metabolites. When we would like to identify a metabolite, we will have LC-MSn data, this is retention time, mass and maybe known substructures for an unknown metabolite.

I would propose candidate structures either by mining databases like HMDB or PubChem, or via computer assisted structure generation.  Next I would use my model to predict the RT and reject those structures whose predicted RT is way off from the experimental.

My concerns about the data

  1. We have 161 metabolites with an HMDB_Id and RT (which was measured twice). Notice that 118 of these have RT between 1.1 and 10 minutes (most of them between 1.1 and 3), and only 43 metabolites have RT between 10 and 40 min. This doesn’t look well distributed. That’s the way it is.
  2. 7 Standards were added ( Tyrosine,  Adenosine,  Tryptophan,  Phenylalanine,  Biotin,  LPC-17,  LPC-19) which I could use to correct the experimental RT, like they do with Kovats Indices in gas chromatography (GC). But these standards only show RT for 77 of the 161 metabolites. What to do with this? Building a model with only 77 RTs sounds like to few data points, which could lead to over fitting the model.
  3.  How to use the standards to generate indices?

What kind of data is this? 

This is what I know so far about the experimental setup.
All reagents used were of HPLC grade purity or higher purchased at Sigma-Aldrigh (Gillingam UK).

Preparation of urine samples
Urine samples collected from healthy volunteers in the morning, 3 males and 2 females in total. The samples were diluted with water in a ratio of 1:1 (v/v). 2 ml in total when diluted This is centrifuged at 16.1 krpm at 10 oC. The supernatant is collected afterwards. 375 µl of the supernatant was transferred to a tube to add 75 µl of the academic mixture from 2.1. One urine sample consisted out of all volunteers by adding 75 µl of each volunteer to a volume of 375 µl.

Reproducibility study
Two different reproducibility were checked in positive ion settings, that of the chromatography and the fragmentation repeatability. The internal standards were used to test the reproducibility of the LC by checking the internal standards of each of the volunteers and the pooled sample. The total length of the study was 54 runs (9 runs for each sample) For the fragmentation reproducibility tyrosine (0.01mg/ml) was injected 40 times from 40 different wells and the differences in the mass to charge ratio was studied.

 HPLC/LTQ Orbitrap XL operation
Samples were analyzed in positive ion mode. Samples were analyzed in a randomized order using the Agilent 1200 with a flow of 250 µl coupled to a reversed phase Atlantis C18 T3 column (ID 2.1×100, particle size 3µm,) linked  to the nano ESI (Triversa nanomate, Advion,) and LTQ-Orbitrap XL (thermo Finnegan). The column was eluted with 2 solvents to create a gradient. Solvent A consisted of: 98% H20 + 2 % Acetonitrile + 0.1 % Formic Acid (v/v), Solvent B consisted of 98% Acetonitrile+ 2% H20 + 0.1 % Formic Acid (v/v).  To provide better reproducibility a thermostat was placed over the column in order to minimize the temperature effects in the room during the day. 5 µl sample was injected each run. The injection loop of the LC was 40cm. Centroided mass spectra were acquired between the range of 60-1000 m/z using the LTQ-Orbitrap at a resolution of 60,000.All samples not in use were stored at -20 oC

If your answer is doing the HPLC experiment again

My first goal would be to use the current data to build a model and test it in metabolite identification, keeping in mind the lemma of statisticians concerning data quality “crap in, crap out”. If the data doesn’t allow me to build a model, so be it.
In any case, I might have some student doing similar experiments again, so I could redo the experiments.
  1. How would you collect enough data to make a reasonable model?
  2. Add known compounds to the urine to have more data points?
  3. Use the same standards we have used and make sure they are measured for every data point?
  4. What deviation of predicted RT from experimental is acceptable to reject candidate structures?


If you have any useful tip, please leave a comment or send me an email at julio{at}

and I will be forever grateful.