2 easy steps to create your standard metadata file

2 easy steps to create your standard metadata file

Parul Bindal
May 27, 2019

Metadata is information about data. Making a metadata file is a standard and repetitive step in a bioinformatician’s day-to-day workflow who deals with metabolomics data. While performing the mass spectrometry experiment in a metabolomics lab, the bioinformatician inputs samples’ relative concentration in the machine. Its output is fed to tools like El-MAVEN or Multiquant which are responsible for mapping out intensities for those samples.

But to perform some downstream analyses like Kinetic Flux Analysis, they need absolute concentrations of unknown samples/biosamples. That’s where Polly QuantFit comes in handy.

User flow
Fig 1. User Flow

Polly QuantFit is a cloud-based app which helps in absolute quantification of metabolites. Standard/known samples are mapped as Intensities to concentrations to get the best curve fit. Unknown samples are then mapped to this curve to obtain their concentrations.

Polly QuantFit
Fig 2. Polly QuantFit

Why do we need standard metadata?

Standard Metadata helps us understand two things- 
a. Among all samples, which samples are the standard samples?
b. What is the concentration of all the metabolites in all standard samples?

How metadata file looks like
Fig 3. How Metadata File looks like

The problem in the traditional way of creating a metadata file

Number of samples and metabolites
Fig 4. Number of samples and metabolites

For a typical metabolomic dataset, the number of samples and metabolites is as given in Fig 4.

Once the user has identified the standard samples, the next step is to fill their concentrations in an excel file.

Now, let’s do some maths to understand what it means for a user in two scenarios:
a. Best Case — 5 standard samples and 100 metabolites
Number of cells to be filled — 5 * 100 = 500
b. Worst case — 10 standard samples and 500 metabolites
Number of cells to be filled — 10 * 500 = 5000

Clearly, the problem here is the time, complexity and possibility of error in manually filling 5000 cells in an excel sheet for one dataset.

How Polly solved it?

a. Select and move standard samples
Polly autodetects the samples which have the general nomenclature of std/STD prepended in the name as standard samples. Also, it lets user select and move the samples to either of the lists if some sample is missed or if a user follows a different naming convention.

select standard samples
Fig 5. Select Standard Samples

b. Fill ConcentrationThe next step is to fill the concentration for all the metabolites.

Fill Concentrations of all metabolites for all samples
Fig 6. Fill Concentrations of all metabolites for all samples

This step is different from filling concentrations in excel file in three ways –

  • A very common use case is to fill the same concentrations for all metabolites in different samples. The user would either need to fill all these concentrations one by one or copy and paste those values in all columns multiple times. But Polly provides the user the option to apply the same concentrations to all or selected metabolites in just one click.
  • Option to fill 0 or NA to non-filled cells in the case when the concentration for the standard is zero or standard concentrations is applicable for only a pool of metabolites respectively.
  • To find some metabolite towards the last column, instead of scrolling all the way, search any metabolite in the search bar and the screen will slide towards that metabolite automatically.

Polly has reduced the time and UX complexity of filling 5000 cells in an excel sheet to just a few clicks.

What next?

a. This format will be implemented in other Metadata interfaces like the Sample Cohort and Normalization Factor Interface.

Fig 8. Sample Cohort Interface

b. The work is being done to implement the interface for MS/MS data as well.

Fig 9. Metadata Interface for MS/MS Data

c. We will continue to work on some critical insights gained from our users on further user experience enhancement.

We, at Elucidata, are trying to solve such more challenges faced by scientists through our platform Polly to accelerate the process of target discovery. To know more about Polly, click here.

Subscribe to our Newsletter

Get the latest insights on Biomolecular data and ML

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Blog Categories