The basic definition of a baseline defines it as a broad trend that causes the true signal to “drift” along the y-axis. For example, a continuous extra energy reading caused by heat developed in sensor plate over time. This component has very low frequency, and should not change much in a 5 THz range.But you already knew that right? You came here to find out which baseline algorithm is the best.
In this article we compare 3 known algorithms for baseline correction in mass spec, namely:
At the end of the article, we compare the three head to head on 2 major accuracy parameters involving 4 real world scenarios, but first let us define these algorithms in brief.
This algorithm was used by us on our tool – El-MAVEN, to define the threshold baseline i.e. to filter the noise. It was a part of the original Maven project written by Eugene Melamud and is defined using two parameters:
However, there were some challenges that we faced with this baseline, which led us to the discovery of other options mentioned below. These drawbacks were:
This is a very different approach of baseline correction compared to AsLSS, where an ideal model for baseline behaviour is obeyed and reflected in the algorithm. The TopHat filter is a well known operation used in morphological image analysis but can be applied to any kind of uniformly-distributed signal to obtain its baseline reduced “true” form. It features:
The Asymmetric Least Square Smoothing Algorithm (AsLSS), uses smoothing (determined by λ) to give a slowly varying trend of the signal. AsLS has an “asymmetry” (denoted by p) parameter that can be used to favour only one type of peak – either positive or negative.PS: This is the Elucidata choice.For our purpose, the algorithm uses a variant of the “Whittaker” smoother where positive deviations with respect to baseline are weighted much less as compared to negative ones. This algorithm has several advantages such as:
Here is the part you’ve been waiting for. The accuracy comparison of the Thresholding Algorithm, AsLSS Algorithm and TopHat has been shared below.
True Baseline was determined by manually curating EICs with the correct baseline and different peak patterns as mentioned in the cases below. The true baselines were recorded and drift was then added to each of these EICs.
FeaturesThreshold AlgorithmAsLSS AlgorithmTopHatMean of R Squared Value0.74220.96580.9528Mean relative area difference when there are two or more peaks19.148.0619.22Mean relative area difference when two peaksare close to each other18.8012.1121.67Mean relative area difference when two close peaks havea significant difference in intensity29.5621.8327.89Mean relative area difference when there is a noisy EICwith distinct peaks35.154.2113.32
The verdict is out and we have a clear winner – AsLSSIt performed better baseline estimation as compared to both Thresholding algorithm and TopHat algorithm in terms of Goodness-of-Fit and calculation of Area Under Peak with respect to the true baseline of an EIC.Now that you’ve read it, test out this algorithm in action, download El-Maven and see the difference yourself.Please note that all these algorithms and comparisons pertain to mass spec analyses only.
Get the latest insights on Biomolecular data and ML