AI Labs

Navigating the Landscape of Agentic AI in Computational Biology

Whether you are looking for an autonomous scientist to architect a CRISPR screen or a reliable digital analyst to process a spatial transcriptomics pipeline, the rise of "BioAgents" is rapidly shifting the landscape of drug discovery. No longer just simple LLMs, these agents now integrate massive biological data lakes with specialized toolsets to perform complex reasoning. But as the field crowds with contenders, a critical question remains: which of these agents can actually do the math, and which ones are just "hallucinating" a PhD? In this deep dive, we benchmark three of the industry’s most talked-about AI Agents, Stanford’s broad-spectrum Biomni, Genentech’s precision-focused SpatialAgent, and the workflow-centric Polly BioAgent to see how they handle real-world spatial analysis and literature mining tasks.

  • Biomni by Stanford: A broad, general-purpose agent. It dynamically composes workflows across 25 different biomedical subfields (from CRISPR to drug discovery to clinical genomics) without relying on predefined templates. It integrates an 11GB data lake, 150 tools, and 59 databases.
  • SpatialAgent by Genentech: Optimized for spatial biology and single-cell RNA-seq. Unlike Biomni’s template-free approach, SpatialAgent uses a hybrid of adaptive reasoning and 17 predefined "skill templates" (playbooks for tasks like cell-cell communication or gene panel design) to ensure high-fidelity outputs in complex spatial tasks.
  • Polly BioAgent: Focuses on workflow orchestration, structured deliverables, reproducibility, and strict safety guardrails for standard bioinformatics pipelines (like bulk RNA-seq and differential expression).

Evaluating Mathematical Execution and Biological Reasoning

To benchmark these agents, two tasks were designed that test the boundaries of their capabilities, testing both their mathematical execution and their biological reasoning.

Task 1: Spatial Transcriptomics Analysis

Prompt: Download the 10x Genomics Visium spatial transcriptomics dataset for the adult mouse brain (sample_id='V1_Adult_Mouse_Brain'). Once loaded, perform spatial clustering to identify distinct anatomical tissue domains. Generate a spatial scatter plot of the tissue coordinates colored by these identified clusters. Next, calculate spatially variable genes and infer ligand-receptor cell-cell communication networks between the distinct spatial domains using an appropriate Python spatial library (like Squidpy or CellPhoneDB). Output a summary report of the top 3 most active signaling pathways and save the interaction matrix to cci_matrix.csv.

Rationale: This task was selected to test the agents' capacity to handle specialized, multimodal data. Spatial transcriptomics requires integrating abstract gene expression matrices with physical tissue coordinates. This exposes whether an agent actually understands domain-specific bioinformatics methodologies (e.g., using spatial autocorrelation metrics like Moran's I) or if it lazily applies generic data science approximations (e.g., standard Coefficient of Variation) to complex biological problems.

Task 2: Literature Mining & Drug Repurposing

Prompt: Take the following list of upregulated genes in Alzheimer's disease: APOE, TREM2, CD33, and CLU. Programmatically query biological databases to identify enriched GO terms and KEGG pathways for this specific gene set. Based on the identified pathways, suggest 3 FDA-approved drugs that could potentially be repurposed to target this signaling cascade. Provide a detailed markdown report of your reasoning, including database IDs or literature references to support your hypothesis.

Rationale: Bioinformatics requires synthesizing results into actionable biological insights. This task was selected to isolate the agents' Retrieval-Augmented Generation (RAG) and reasoning capabilities. By forcing them to cite specific literature and database IDs, this test acts as a strict evaluation of an agent's factual grounding versus its susceptibility to hallucinating fake scientific citations.

Results

Key Findings and Final Insights

  • Biomni turned out to be a jack of all trades, master of none. It struggles with niche methodologies unless explicitly guided step-by-step. It treats specialized data like standard tabular data.
  • SpatialAgent is the undisputed winner for mathematical and algorithmic rigor in its specific domain, but its tendency to hallucinate literature citations can be dangerous for unverified downstream research.
  • Polly BioAgent sacrifices a bit of computational depth (e.g., its 51-pair CCI shortcut) in favor of execution stability, perfect traceability, and factual grounding. It acts the most like a reliable, structured analyst.

Also read: CellAtria vs Polly BioAgent, an in-depth comparison exploring how ingestion-focused agents differ from autonomous bioinformatics workbenches, and what that means for real-world single-cell analysis workflows.

Blog Categories

Talk to our Data Expert
Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.
Oops! Something went wrong while submitting the form.

Watch the full Webinar

Blog Categories