Will I Ever Understand What p-value Really Means?

Probably p-value and hypothesis testing are some of the concepts that everyone has some difficulty understanding at first, considering how schools and statistics classes usually teach, how to calculate p-value.When someone is first introduced to hypothesis testing and p-values they always fail to understand why they are doing what they’re doing, and resort to either memorising formulas, searching online for p-value calculators or cram the golden rule:

p < 0.05 implies it is significant
p > 0.05 implies it is non significant.

But today we are here with an intuitive way of understanding what a p-value means, how do we calculate p-value and why do we even need it, to clear part of the mental cloud that exists around it, in people’s mind.Let us begin with a simple example to explain p-value.

Imagine you’re a teacher in a school and you have a class of 30 students. After giving a test to them, most of them fail miserably and the principal comes in and tells you to do something, or you’re fired!

With looming tensions and sleeplessness, you somehow devise a new method of teaching your students and try it out on them for the next one month, after which the students take the test again.‍

Fortunately, the students nail the exam this time and everyone is happy. You still have your job, but again in the middle of the night you start overthinking,

‍Did the students really perform well because of my new teaching method or did this happen by chance, and it had nothing to do with my efforts? It is very possible it might have happened by chance. (especially when large numbers are involved)

‍After thinking about it for long, you decide to get your hands dirty and employ some statistics.

1. Firstly, you decide that comparing average scores of the two conditions (old teaching method and new teaching method) is the way to go. Your observation (what has really happened) is the alternate hypothesis. In terms of formulas:

Mean score (new) > Mean score (old)

2. Next, you ask yourself that what happens if the scores for the two different cases follow the same distribution (a fancy term for continuous histogram). This is called your null hypothesis (another fancy term). Again in terms of formulas:

Mean score (new) = Mean score (old)

3. Now you devise a plan and say to yourself

“I will assume the null hypothesis to be true” (my hypothesis, my rules)
“Then I will find the probability that the alternative hypothesis (what has really occurred) will occur” (using more maths)
If that probability is really low, that means that the effect (the thing which has happened) may not have occurred by chance. Going by the terminology used, this is called as a significant effect
But if that probability is not low, you can reason that the effect has most likely occurred by chance
This threshold you’ll compare your probability to is often called alpha (or significance level) which is usually set at 0.05

‍

Now a glaring question comes into view, “How do you get to this p-value from your data?”

‍By using something called a hypothesis test. There are probably 100s of hypothesis tests out there but the question comes down to which one is best suited for your data. The question of which test to choose is a very detailed discussion and we’ll save it for another post. (let us know in the comments, if that is something you want)

‍Test scores usually follow a normal distribution(note that this has been observed in populations) and the difference in the means of independent samples drawn, from two normal distribution follows a particular type of distribution called the t-distribution.

‍Without going into the details, two independent samples from two normal distributions can be compared using a t-test and that is what we will use. This part will involve a lot of calculations. Just remember that to get to p-values from data, we need some distributions.

In short, we assume that the difference in the sample mean scores (of the two conditions) follow the t-distribution and apply the t-test to get the t-statistic that can be translated to a probability which is our p-value.

You do all this and see that p-value is indeed less than 0.05 and say to yourself, “The probability that students performed well as compared to the last time, by mere chance is very small at 0.05 significance level” and feeling proud of your achievement, you let the sleep come to you.

See how Polly uses the p-value in the analysis. Book a demo today!

Blog Categories

CDMO

Top Drug Targets

AI Labs

Data Analysis and Management

Data Quality & Compliance

Industry Features

Product & Engineering

Data Science & Machine Learning

Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.

Oops! Something went wrong while submitting the form.

Other Resources

Case Studies Dataset Roundup Documentation Glossary Solution Briefs Webinars Whitepapers

Upcoming Webinar: Evidence-Driven Target Discovery: Knowledge Graphs That Reconstruct Disease-State Transitions

Register Now

Polly Modules

Data Modalities

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Will I Ever Understand What p-value Really Means?

Blog Categories

Talk to our Data Expert

Other Resources

Watch the full Webinar

De-risking Autoimmune Clinical Trials with Agentic AI

Blog Categories

Why Regulatory Intelligence Is Drowning in Documents

Why Regulatory Intelligence Is Drowning in Documents

Spreadsheet Hell Is Still the Default in CDMO Data Handoffs, and It's Costing You More Than Time

Spreadsheet Hell Is Still the Default in CDMO Data Handoffs, and It's Costing You More Than Time

Why Workflow Automation Matters for Antibody Development and Biologics R&D

Why Workflow Automation Matters for Antibody Development and Biologics R&D

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

How Whole Genome Sequencing Helps Researchers Unlock Deeper Biological Insights

How Whole Genome Sequencing Helps Researchers Unlock Deeper Biological Insights

Whole Exome Sequencing: Accelerating Precision Diagnostics with Variant Stores and Multimodal Data

Whole Exome Sequencing: Accelerating Precision Diagnostics with Variant Stores and Multimodal Data

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

Target Discovery and Independent Orthogonal Validation for Small Cell Lung Carcinoma

Polly Scout: Find the Fastest Path to Right Public Biomedical Data

CellAtria vs Polly BioAgent: Why Autonomous AI Beats Rigid Pipelines?

Challenges with Diagnostics Data Processing Pipelines

info@elucidata.io

info@elucidata.io

info@elucidata.io

Upcoming Webinar: Evidence-Driven Target Discovery: Knowledge Graphs That Reconstruct Disease-State Transitions

Register Now

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Will I Ever Understand What p-value Really Means?

Blog Categories

Talk to our Data Expert

Other Resources

Related Blogs

Why Regulatory Intelligence Is Drowning in Documents

Spreadsheet Hell Is Still the Default in CDMO Data Handoffs, and It's Costing You More Than Time

Why Workflow Automation Matters for Antibody Development and Biologics R&D

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

How Whole Genome Sequencing Helps Researchers Unlock Deeper Biological Insights

Whole Exome Sequencing: Accelerating Precision Diagnostics with Variant Stores and Multimodal Data

Watch the full Webinar

De-risking Autoimmune Clinical Trials with Agentic AI

Blog Categories

Get the latest news, industry insights, and updates delivered directly to your inbox.

Latest Blogs

Why Regulatory Intelligence Is Drowning in Documents

Why Regulatory Intelligence Is Drowning in Documents

Spreadsheet Hell Is Still the Default in CDMO Data Handoffs, and It's Costing You More Than Time

Spreadsheet Hell Is Still the Default in CDMO Data Handoffs, and It's Costing You More Than Time

Why Workflow Automation Matters for Antibody Development and Biologics R&D

Why Workflow Automation Matters for Antibody Development and Biologics R&D

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

How Whole Genome Sequencing Helps Researchers Unlock Deeper Biological Insights

How Whole Genome Sequencing Helps Researchers Unlock Deeper Biological Insights

Whole Exome Sequencing: Accelerating Precision Diagnostics with Variant Stores and Multimodal Data

Whole Exome Sequencing: Accelerating Precision Diagnostics with Variant Stores and Multimodal Data

Trending Blogs

How Agentic AI is Rewriting the Rules of Flow Cytometry: An approach towards Automated Gating in AML.

Target Discovery and Independent Orthogonal Validation for Small Cell Lung Carcinoma

Polly Scout: Find the Fastest Path to Right Public Biomedical Data

CellAtria vs Polly BioAgent: Why Autonomous AI Beats Rigid Pipelines?

Challenges with Diagnostics Data Processing Pipelines

info@elucidata.io

info@elucidata.io

info@elucidata.io