Knowledge graph scoring is what turns a connected dataset into a decision tool. When teams first adopt a biomedical knowledge graph, the early thrill is connectivity - omics files, phenotype tables, drug–target lists, and clinical summaries finally speak the same language. But the very next challenge appears: too much is connected.
Take a common respiratory indication. Curated databases like OMIM already list 300+ genes associated with asthma, and broader association sources add orders of magnitude more signals. If you just project those links into a graph, you’ll get long, alphabetized lists that all “look” equally plausible. Without scoring, scientists still face the same dilemma: where do we begin?
Scoring is the layer that turns a graph from a static map of relationships into a decision-making tool. It encodes your priorities, whether it is novelty vs. validation or repurposing vs. new discovery, and translates them into ranked outputs that scientists can act on.
Take the example of drug repurposing. Here, the most valuable hits are drugs that impact non-target genes already linked to other indications. If the goal is entirely new targets, the priorities flip; now the emphasis is on novelty, under-explored genes, and sparse prior evidence. The same dataset, two very different shortlists.
Polly KG was designed with this premise: every connection should be both visible and rankable - and the way we rank should faithfully reflect the problem you’re solving.
There isn’t a universal “best” score. What counts as “high value” depends on the scientific goal.
Polly KG uses a two-tier scheme:
Base-only (novelty-leaning):
Biased to asthma:
Same graph, same evidence - different lens, different shortlist.
Scores aren’t just math; they’re judgment, expressed numerically.
When novelty is the aim, you up-weight signals that mark under-explored biology: preclinical over approved, fewer known drugs per gene over many, gently down-weight well-trodden targets. That inevitably pushes some highly drugged, repurposable genes down the list. These are not right-or-wrong decisions; they are trade-offs that should be deliberate and aligned with the problem at hand.
Flip the program to repurposing, and the logic flips with it. You privilege well-characterized targets, multiple independent lines of evidence, and tractability histories. A gene might rise precisely because human evidence and clinical tooling are deep (and potentially portable), even though it wouldn’t qualify as “novel.”
The deeper lesson is that scoring must always be anchored in context. There is no single formula that works for every program. A discovery team and a translational team will want different things from the same graph. Polly KG makes that flexibility possible.
Another critical principle is that no single data type should dominate. A GWAS hit by itself is rarely enough to justify attention, just as a gene expression spike in one dataset might mean little without supporting evidence.
Scoring in Polly KG is designed to compound across modalities - clinical trials, gene expression, genetic variants, phenotypes - so that no decision rests on a lone signal.
Take the case of a gene–drug connection. Its score may combine the trial phase, number of drugs already linked, and tractability as baseline evidence. On top of that, it may be reinforced by expression data showing upregulation in disease tissue, or by genetic evidence from GWAS studies. The result is not a binary yes/no association but a layered ranking that reflects the breadth and depth of evidence.
This multi-modal compounding is what transforms a biomedical knowledge graph from an information store into a drug discovery knowledge graph - a decision engine that scientists can trust.
A Boston-based biotech exploring cross-species signals from a non-model organism needed to surface human targets for three indications. We enriched the base Polly KG with the company’s in-house datasets alongside selected public resources, harmonizing them for cross-species mapping. On top, we enabled natural-language querying so biologists could ask questions directly, and working with the team, designed a multiparameter scoring framework aligned to their criteria. Within the first month, 12 users ran more than 400 queries, and the program prioritized five targets that moved into wet-lab validation - an example of how a tailored scoring layer turns dense connectivity into focused action.
A graph is only useful if people can query it in seconds. At one point, in our experience, modeling sample-level connections pushed a graph from ~16 GB to ~27 GB; traversals that used to finish in seconds started timing out. The fix wasn’t bigger machines; it was smarter aggregation (e.g., by tissue or species) and filtering out low-confidence edges that didn’t help the use case. Scientists kept the details they needed, and the speed they expected.
This is a scoring lesson, too: the same instinct that prunes structure should prune evidence. High-noise modalities should contribute less until they earn their keep.
In oncology, a Boston-based precision-medicine company working in AML needed a comprehensive, multi-omics view to reduce downstream risk and accelerate decisions. We assembled an atlas of roughly 10,000 AML-specific human samples from more than 10 public sources, harmonized them, and fed the data into a multi-modal knowledge graph. A custom scoring framework then ranked differentiation-based targets for experimental follow-up, and public cancer-cell-line data helped guide validation. Over six months, the team identified and validated two targets, accelerated target ID by about 4X, and advanced a candidate that subsequently received FDA Fast Track designation - evidence that a scoring layer can compress time from data to decision.
Scoring rules should evolve with the science without surprising downstream users. The way to do that is simple: version your scoring schema (e.g., scoring_v0.3), keep a one-line note on what changed and why, and run a quick “rank-delta” check before and after any material update. If three of the top 20 move, look at the reasons; if the shifts reflect your intent (say, a tighter GWAS threshold or a stronger disease bias), proceed with scientist sign-off. Add basic guardrails - minimum evidence floors - so no single modality can hijack the list.
Equally important, keep the system explainable. If a gene sits at #1, everyone should know why. In Polly KG, base components (trial phase, drug count, tractability) are visible, and disease biases are explicit (“this list is asthma-weighted”). That clarity lets discovery and translational teams negotiate the weights, not the data - and it’s what builds trust in the list.
Validation as evidence. When bench results arrive - say, a perturbation screen that consistently shifts a fibrosis marker - those readouts should land on edges as first-class signals. In ranking, “validated in our hands” should outrun “promising in silico.”
AI-assisted thresholds, human-audited. Machine learning can suggest cut-points (where a p-value or effect-size threshold best separates signal from noise) or propose cross-modal weightings. Domain experts should own those decisions after inspecting suggestions against known biology and program risk.
Topology-aware features and link prediction. Graph-native signals - path length, node degree, evidence density - are natural next layers. They can surface indirect but compelling routes (e.g., short, high-confidence paths connecting a drug class to an unexpected phenotype) and feed a principled link-prediction module.
Most first-time graph programs stumble the same way: graphs too big to be interactive, opaque or rigid scoring, one-size-fits-all formulas that force teams to compromise. A better pattern:
Keep the graph queryable (aggregate where it doesn’t hurt science).
Keep the scores transparent (so users can argue weights, not the data).
Let intent drive bias (base + biased scores, not endless forks).
Compound across modalities (confidence lives in convergence).
Do that, and your knowledge graph stops being a static atlas of associations and becomes a compass - one that points your team to the next experiment, not just the next edge.
Scoring doesn’t just organize your graph - it tells you where to go next.