High Level Solution for Capacity Modelling CDMOs

‍

CDMOs operate under a constant stream of evolving requirements from their Pharma partners. When a new RFP arrives for small molecule manufacturing, it often includes hundreds of pages of technical specifications and complex process details. Reviewing and acting on these documents both quickly and accurately is a significant challenge.

To address this, Elucidata partnered with a small molecule CDMO to streamline their capacity modeling workflow and enable faster, more competitive proposal generation. In this blog, we share the high-level solution architecture behind this workflow and how it improves accuracy, traceability, and throughput.

Problem Statement

Manual and unscalable workflows: CDMO capacity and cost modeling relies heavily on spreadsheet-driven, manual processes that are difficult to scale and maintain as complexity increases.
Fragmented data and poor traceability: Critical inputs are gathered from multiple, disparate sources, and over time, it becomes difficult to trace where specific values originated or who approved them.
Error-prone extraction, calculations, and chemical verification: Errors commonly occur during data extraction and downstream calculations, and verification of chemical details, especially for novel or proprietary compounds, remains a significant challenge.
Compounding inaccuracies and low confidence: Inaccuracies introduced early propagate across stages, leading to unreliable capacity and cost estimates, reduced throughput, and diminished confidence in final outputs.

Key Challenges and Detailed Observations

1. Fragmented Inputs and Loss of Traceability

Capacity modeling requires inputs spanning chemical properties, manufacturing assumptions, contractual terms, timelines, and internal capacity constraints. In current workflows, these inputs are manually collected from multiple, disparate sources such as RFP documents, contracts, prior proposals, spreadsheets, and external references. This manual aggregation is time-consuming and introduces variability in how information is captured and interpreted.

As data moves across stages and between tools, traceability is gradually lost, making it difficult to determine where a specific value originated, whether it was inferred or explicitly stated, and who reviewed or approved it. This combination of manual collection and poor traceability makes validation difficult and weakens confidence in downstream capacity and cost estimates.

2.Manual, Spreadsheet-Centric Workflows

Spreadsheets remain the primary tool for storing data, performing calculations, and reviewing assumptions. While flexible, spreadsheets are not designed for complex, multi-stage capacity modeling involving validation rules, approvals, provenance tracking, and iterative refinement. As workflows grow in complexity, spreadsheet-based approaches become increasingly fragile, difficult to maintain, and hard to scale across teams or projects.

This reliance on spreadsheets also increases manual touchpoints, copying data between files, reconciling versions, and reapplying calculations, which slows throughput and makes the overall process unsuitable for complex and repeatable capacity modeling use cases.

3. Compounding Errors Across Stages

Capacity and cost estimates are inherently sequential: outputs from one stage become inputs to the next. When errors occur during early extraction or estimation steps, they propagate through later calculations without being detected. Over multiple stages, small inaccuracies compound into materially unreliable capacity commitments, timelines, and cost projections.

4. Chemical Data Accuracy for Novel Compounds

Novel or proprietary chemicals are frequently not available in standard datasets. General-purpose LLMs may provide plausible-looking but incorrect chemical information based on pattern matching from previously digested data. External APIs can also return incomplete or inconsistent results.

Without a deterministic validation layer and explicit cross-source verification, chemical properties such as molecular formula, molecular weight, density, etc, become a major source of downstream error in capacity modeling.

5. Throughput Limitations and Rigid Tooling

Current capacity modeling workflows involve multiple manual handoffs between people and tools, leading to slow turnaround times. Reviews and corrections often require moving data across spreadsheets, documents, or email threads, making it difficult to process proposals efficiently.

Additionally, existing tools can be rigid and prescriptive, offering limited flexibility to adapt workflows or approval steps to organization-specific needs. This forces teams to work around the tool rather than with it, further reducing throughput and scalability.

Proposed Solution Overview

We propose a configurable, AI-assisted, human-in-the-loop platform for capacity-first CDMO modeling. The platform is designed to automate repetitive tasks, reduce manual touchpoints, and improve throughput, while preserving transparency, validation, and human accountability at every critical step.

Rather than acting as a rigid product, the system will be built as a flexible solution that can be tailored to each organization’s capacity modeling processes, approval structures, and data standards, enabling teams to work more efficiently without compromising control or accuracy.

End-to-End Workflow

High-level architecture of CDMO capacity modeling workflow from RFP submission to final data storage. — High-Level Capacity Modelling for CDMO Solution Architecture

1. Structured Extraction with Defined Schemas

Users upload RFPs, technical documents, and contractual materials. The system extracts relevant information into a predefined structure covering areas such as descriptions, key attributes, requirements, proposed stages, procurement details, manufacturing details, yield assumptions, capacity and cost drivers, MSA terms, etc.

Each field is extracted using predefined prompts with high-quality examples, ensuring the output is consistently structured and easy to compare across proposals. Where appropriate, the AI can also add relevant details that are implied in the documents but not explicitly stated.

2. Transparency, Confidence, and Human Review

For every extracted field, the system explicitly indicates whether the value was AI-populated or manually entered, displays the source(s) used, and assigns a confidence or accuracy level. Fields with low confidence are flagged for attention and manual verification.

Certain sections, such as manufacturing details, capacity commitments, and contractual terms, require mandatory approval from qualified personnel before the workflow can proceed. At each step, users can interact with the AI to refine or correct extracted values.

3. Chemical Validation and Property Resolution

AI-assisted chemical validation workflow verifying CAS IDs and chemical properties. — Chemical Validation Flow

Chemical detail handling is implemented as a deterministic validation pipeline, rather than a free-form AI lookup process. The system initially queries authoritative sources such as PubChem to identify the chemical and retrieve associated identifiers and metadata. Since CAS identifiers obtained from PubChem may occasionally be incomplete or inaccurate, these identifiers are validated against additional trusted sources such as the CAS Registry, ChemicalBook, and Sigma-Aldrich.

Once a CAS identifier is confirmed, chemical properties are retrieved from multiple authoritative sources and cross-checked for consistency. Any discrepancies automatically trigger a mandatory manual verification step. If a property cannot be reliably established from verified sources, it is explicitly marked as unknown rather than inferred or approximated.

For novel or proprietary chemicals where external references are unavailable or incomplete, the system relies on fundamental chemical definitions. Molecular weight is calculated directly from the molecular formula, and derived properties (such as density, where applicable) are computed using validated calculation methods or flagged for expert review when assumptions are involved. No speculative values are introduced.

All verified and approved chemical information, whether externally sourced or internally derived, is stored in an internal, curated chemical database. This enables future proposals to reuse validated chemical data without repeating the full identification, validation, and review process, while maintaining full traceability and auditability.

4. Capacity and Cost Modeling

Capacity modeling is treated as the primary objective, with cost modeling derived from validated capacity assumptions. Time and cost estimates are suggested based on a combination of verified internal datasets and historical high-quality RFPs.

The AI leverages knowledge of past successful proposals to populate realistic estimates, but final commitments require human approval. All assumptions remain visible, editable, and traceable.

5. Model Strategy and Continuous Evaluation

Different models may be used for different tasks, such as extraction, chemical validation, and estimation. Users can compare outputs across models to assess quality. We continuously evaluate newer tools and models, promoting those that demonstrate improved accuracy or performance.

6. Monitoring, Feedback, and Improvement

The platform tracks usage and performance metrics, including the number of RFPs processed, time saved per RFP, user correction frequency, and response times. User feedback and corrections are incorporated into the extraction and estimation logic over time.

Annual performance reports compare baseline performance with year-over-year improvements, enabling continuous optimization.

7. Support and Maintenance

The solution includes a dedicated support channel (email, Slack, or Teams), a 24-hour response SLA for escalations and bug reports, and bi-weekly maintenance sprints for fixes and enhancements.

8. Security and Compliance

The platform can be designed to meet SOC 2 compliance requirements, building on prior experience delivering SOC 2–compliant systems. This includes implementing robust access controls, role-based permissions, comprehensive audit logging, and encryption of data at rest, along with secure key management & incident response management. The team also has prior experience in HIPAA-regulated environments and is capable of implementing systems that meet similar compliance and security expectations where required.

‍