flowchart LR
subgraph Sites["Clinical Sites"]
EHR[Electronic Health Records]
Wearables[Wearables & Sensors]
Patient[Patient Portal]
end
subgraph Core["Core Data Platforms"]
EDC[EDC<br/>Electronic Data Capture]
CTMS[CTMS<br/>Clinical Trial Management]
eTMF[eTMF<br/>Trial Master File]
RTSM[RTSM<br/>Randomization & Supply]
end
subgraph AI["AI Layer"]
Design[Protocol Design AI]
Recruit[Patient Matching AI]
QC[Data Quality AI]
Predict[Predictive Analytics]
end
subgraph Outputs["Regulatory & Insights"]
Submit[Regulatory Submissions]
Reports[Real-time Dashboards]
Risk[Risk Signals]
end
EHR -->|Patient Data| EDC
Wearables -->|Biometrics| EDC
Patient -->|ePRO/eCOA| EDC
EDC <--> CTMS
CTMS <--> eTMF
CTMS <--> RTSM
EDC --> QC
CTMS --> Predict
eTMF --> QC
Design --> EDC
Recruit --> CTMS
QC --> Reports
Predict --> Risk
eTMF --> Submit
EDC --> Submit
23 AI in Trials
This chapter examines how technology and methodological innovation are reshaping clinical trials. The scope is intentionally broad, covering three interconnected developments: the IT infrastructure that now underpins trial operations, the AI and automation tools being layered on top of that infrastructure, and the emerging methodological approaches—real-world evidence, digital twins, synthetic controls—that are changing how trials are designed and how evidence is generated.
These topics belong together because they share a common theme: the shift from trials as primarily manual, paper-based endeavors to trials as data-intensive, computationally-mediated systems. The IT platforms determine what data can be captured and integrated. AI tools determine what can be automated or predicted. Methodological innovations determine what kinds of evidence regulators will accept. A decision-maker evaluating a trial strategy must understand all three.
The chapter proceeds as follows. We begin with the technology ecosystem—the core platforms (EDC, CTMS, eTMF, RTSM) that form the transactional backbone of modern trials, and the market dynamics shaping their evolution. We then examine AI applications across the trial lifecycle: study design and protocol optimization, patient recruitment and site selection, and operational data quality. Next, we address emerging methodological approaches—real-world evidence, federated learning, digital twins, and synthetic control arms—that represent alternatives or complements to traditional randomized control designs. Finally, we consider the economic and organizational implications: how these technologies may reshape cost structures, CRO relationships, and the clinical research workforce.
Throughout, we maintain a realistic perspective. Technology can reduce friction and surface risks earlier, but it requires validation, governance, and oversight to be defensible in an inspection. Methodological innovations can improve efficiency, but they introduce new sources of uncertainty and require careful regulatory engagement. The goal is to help readers understand not only what is changing, but how to evaluate whether a given tool or approach is appropriate for regulated clinical research.
Clinical trials now run on a layered digital infrastructure: core transactional systems (EDC, CTMS, eTMF, RTSM), integration middleware, and analytics/automation services that continuously convert operational events into risk signals and decisions. This shift has been accelerated by cloud adoption and by regulatory acceptance of technology-enabled trial conduct, including decentralized trial elements and digital health technologies for remote data acquisition (U.S. Food and Drug Administration 2024, 2023d).
The next transition is conceptual: from automation as a feature (dashboards, rules engines, and isolated machine-learning models) to automation as an operating model, in which software agents can plan, execute, and verify multi-step workflows across the clinical operations stack. This shifts the trial from a linear “pipeline” towards a dynamic, electronic map of activities, where information infrastructure identifies bottlenecks and navigates the “diverse web of iterative learning loops” that characterize modern drug development (Wagner et al. 2018). In this chapter, “agentic AI” is treated as a design pattern—built from “compound” systems that combine models with retrieval, tools, and control logic—grounded in foundational computer science work on “generative agents” that simulate human-like behavior and goal-directed action (Zhang, Chen, and Oney 2023; Berkeley Artificial Intelligence Research 2024). Recent systematic reviews of studies from 2024-2025 indicate that agentic systems can improve clinical task performance by up to 60 percentage points over base language models, particularly in evidence retrieval and task planning (Abou Ali, Dornaika, and Charafeddine 2026).
The clinical question is not whether agents can draft text, but whether they can operate within a regulated environment: preserving data integrity, producing audit-ready traces, and remaining under accountable human oversight—a framework solidified by the 2024 EMA reflection paper on AI and risk-based approaches for leveraging AI in drug development (Unlearn.AI 2025; European Medicines Agency 2024; National Institute of Standards and Technology 2023; Amershi et al. 2019). This technological shift converges with the vision of Digital Twin “Moonshots”, which aim to integrate personalized digital twins directly into medical records to optimize both clinical care and trial participation (Duke Center for Virtual Imaging Trials 2024).
The underlying market dynamics reflect a real shift in how trials are run. The clinical trial software market has grown substantially—estimated at over USD 11 billion in 2024—and continues to expand at a compound annual growth rate exceeding 10% (Grand View Research 2025). AI applications in clinical trials are growing rapidly (Fortune Business Insights 2024). These trends should not be taken as evidence of clinical benefit on their own, but they help explain why sponsors, CROs, and platform vendors are reorganizing workflows around automation and AI-mediated operations.
23.1 The Modern Clinical Trial Ecosystem
The clinical trial technology stack has evolved from disconnected tools into an integrated ecosystem that powers every stage of research, from site operations to regulatory submissions. A decade ago, sponsors managed clinical data through a patchwork of vendor systems that rarely communicated: EDC databases that could not talk to randomization systems, trial master files stored in SharePoint folders with manual indexing, and clinical trial management systems that required spreadsheet reconciliation to produce accurate enrollment counts. Data flowed through exports, imports, and emails—a process that introduced latency, transcription errors, and audit risk at every handoff.
Today, the leading platforms aspire to unified architectures where patient data, operational metrics, essential documents, and supply chain signals flow through shared data models. In the best-integrated environments—particularly single-vendor platforms like Veeva Vault—when a site randomizes a patient, that event can propagate automatically: enrollment counts update in CTMS dashboards, treatment-specific CRF pages unlock in the EDC, drug shipment requests trigger in the supply management system, and expected document checklists populate in the eTMF. In practice, many sponsors still operate heterogeneous stacks with systems from multiple vendors, connected through middleware and custom integrations that require configuration, maintenance, and periodic reconciliation. The degree of integration varies widely across organizations, but the direction of travel is toward reduced manual handoffs and the shared data infrastructure on which analytics and automation depend.
The Technology Ecosystem
The following diagram shows how the major systems interconnect.
The clinical trial technology stack consists of four interconnected core platforms, each serving a distinct but complementary function (Medidata Solutions 2024a).
Electronic Data Capture (EDC) is the primary tool for clinical data collection. EDC systems replace paper case report forms with validated electronic forms that capture patient data at the point of care. When a coordinator records a blood pressure reading, administers a questionnaire, or documents an adverse event, that data flows into the EDC. Modern EDC platforms include built-in edit checks that flag impossible values (a heart rate of 500?) or logical inconsistencies (an adverse event dated before the patient enrolled) in real-time, catching errors before they propagate. The EDC database ultimately becomes the foundation for regulatory submissions—every efficacy and safety analysis traces back to data captured here (U.S. Food and Drug Administration 2023a).
Clinical Trial Management System (CTMS) is the operational command center. While EDC captures patient data, CTMS tracks trial operations: which sites are open, how many patients each has enrolled, when the next monitoring visit is scheduled, and what the budget burn rate looks like. CTMS provides the project management backbone that keeps a 50-site, 14-country trial from descending into chaos. It tracks milestones, manages contracts and payments, and generates the operational metrics that sponsors use to assess trial health (Grand View Research 2025).
Electronic Trial Master File (eTMF) is the regulatory archive. Every clinical trial generates thousands of documents: the protocol and its amendments, informed consent forms, IRB approvals, investigator CVs, monitoring reports, safety letters, and correspondence. Regulators require sponsors to maintain a complete Trial Master File as evidence that the trial was conducted properly. eTMF systems organize these documents according to the DIA Reference Model, track document completeness, and ensure inspection readiness (TMF Reference Model Initiative 2024). When an FDA inspector arrives, the eTMF is the first artifact they examine.
Randomization and Trial Supply Management (RTSM), sometimes called Interactive Response Technology (IRT), handles the logistics of treatment assignment and drug supply. When a patient is eligible for randomization, the RTSM system assigns them to a treatment arm according to the randomization scheme—maintaining the blind while ensuring balanced allocation. Simultaneously, RTSM tracks investigational product inventory at each site, triggers resupply shipments, and manages the complex logistics of getting the right drug to the right patient at the right time. For trials with temperature-sensitive biologics or personalized therapies, RTSM is mission-critical (Clinical Leader 2024).
These four systems do not operate in isolation—they exchange data continuously. When a patient is randomized in RTSM, that information flows to CTMS (updating enrollment counts) and EDC (enabling treatment-specific data collection). When a monitoring visit is completed, the report is filed in eTMF while the visit status updates in CTMS. This integration explains why unified platforms like Veeva Vault, which house all four systems in a single architecture, have gained such traction in the market (Veeva Systems 2024b).
23.2 Agentic AI in Clinical Workflows
As AI moves beyond static models, agentic systems are emerging that can navigate complex regulated workflows by coordinating multiple tools and reasoning steps.
Building Blocks and System Architecture
The recent resurgence of “agents” is best understood as a systems shift. Rather than calling a model once, agentic systems orchestrate multiple calls, state, and tools to complete a task: they decompose goals, retrieve context, take actions, and iterate based on feedback (Cheng et al. 2024; Wang et al. 2024). The Berkeley AI Research perspective on “compound AI systems” provides a useful framing: reliability gains often come from engineered compositions—retrievers, checkers, constrained tool calls, and repeated sampling—rather than from a single monolithic model invocation (Berkeley Artificial Intelligence Research 2024).
Three technical ideas recur across modern agent systems. First, structured intermediate reasoning (e.g., chain-of-thought prompting) can improve performance on complex tasks, though it does not guarantee correctness in the presence of missing or stale information (Wei et al. 2022). Second, retrieval-augmented generation (RAG) externalizes “memory” into a maintained knowledge base and can provide provenance when paired with citations to retrieved sources (Lewis et al. 2020; Douze et al. 2024). Third, coupling reasoning with tool use—where a model decides when to query, calculate, or fetch—yields more controllable trajectories than purely generative text, as formalized in approaches that interleave reasoning and acting (Yao et al. 2023).
In clinical operations, these building blocks map to concrete needs: pulling the right protocol amendment from an eTMF, verifying whether a site has an updated IRB approval, or checking that a safety narrative is consistent with the underlying case data. Each is fundamentally a retrieval + verification problem under audit constraints—not a “creative writing” task.
Operationalizing agents also requires consistent interfaces to external systems. Tool calling standards such as the Model Context Protocol (MCP) aim to standardize how AI applications connect to data sources and actions, turning “integration” into a first-class design surface (Anthropic 2025). In regulated contexts, the value is not novelty; it is the ability to enforce access controls, log every tool invocation, and make agent actions reviewable.
Clinical workflows are naturally multi-role (data management, monitoring, safety, regulatory, sites). Multi-agent frameworks formalize this by allocating roles and coordinating conversations among specialized agents (Wu et al. 2023). However, empirical analyses show that multi-agent systems often fail in predictable ways: specification gaps, coordination failures, and weak verification loops (Cemri et al. 2025). From an engineering perspective, this implies that “agent quality” must be evaluated not only by task accuracy, but also by cost, robustness, and reproducibility (Kapoor et al. 2024). Human-in-the-loop debugging and steering tools are emerging for these systems, reflecting the practical need to inspect and edit multi-step traces rather than treating outputs as black boxes (Epperson et al. 2025).
As soon as agents run concurrently across multiple workflows, operational concerns become first-order: context windows become a managed resource, tools must be scheduled and access-controlled, and traces must be stored in an auditable way. Emerging “agent runtime” proposals make these concerns explicit by separating agent applications from shared services such as scheduling, context management, and access control (Mei et al. 2024). Similarly, pipeline frameworks that treat agent workflows as composable graphs support systematic optimization and regression testing across versions—important for any environment where changes must be validated rather than “shipped and hoped” (Khattab et al. 2023).
Evaluation is also moving beyond single-task accuracy. Benchmarks emphasize realistic multi-step work with tool use and resource constraints, and propose clearer reporting of cost and reproducibility (Chan et al. 2024; Cappello et al. 2025).
Human Oversight and Limitations
Finally, agentic systems can create an illusion of competence: fluent outputs can be mistaken for validated decisions. This is a known socio-technical risk in scientific work, where productivity gains may coexist with a decline in genuine understanding and critical scrutiny (Messeri and Crockett 2024). Human–AI interaction guidelines emphasize making uncertainty visible, supporting oversight and correction, and ensuring users understand system limitations—principles that map directly to quality management and inspection readiness (Amershi et al. 2019).
23.3 Regulatory Framework for AI in Clinical Trials
For any sponsor or CRO considering AI applications in clinical trials, understanding the regulatory landscape is essential. This section provides a practical framework for determining what can and cannot be done with AI, what documentation and oversight are required, and when to engage regulators.
Enforceable Standards vs. Recommendations
A critical distinction exists between enforceable legal requirements and regulatory guidance. Failure to comply with enforceable requirements can result in warning letters, clinical holds, or rejection of submissions. Guidance documents represent FDA or EMA “current thinking” and are recommendations, not mandates—though departing from them without justification invites scrutiny.
| Category | Examples | Consequence of Non-Compliance |
|---|---|---|
| Enforceable Law/Regulation | 21 CFR Part 11 (electronic records), 21 CFR Part 312 (INDs), ICH E6 GCP, EU Clinical Trials Regulation | Warning letters, clinical holds, application rejection, criminal liability |
| Enforceable GxP Standards | ICH E6(R3) computerized systems requirements, Annex 11 (EU), data integrity requirements | Inspection findings, Form 483 observations, regulatory action |
| Regulatory Guidance | FDA draft guidance on AI for regulatory decisions (Jan 2025), EMA reflection paper on AI (Sept 2024) | Increased scrutiny, requests for additional information, delays |
The 10 Guiding Principles for Good AI Practice
In January 2026, FDA (CDER and CBER) and EMA jointly published Guiding Principles of Good AI Practice in Drug Development—the first joint regulatory statement establishing foundational expectations for AI across the drug product lifecycle (U.S. Food and Drug Administration and European Medicines Agency 2026). While not legally binding, these principles represent international consensus on what “good practice” means for AI in drug development.
The following is quoted verbatim from the joint FDA-EMA document:
- Human-centric by design The development and use of AI technologies align with ethical and human-centric values.
- Risk-based approach The development and use of AI technologies follow a risk-based approach with proportionate validation, risk mitigation, and oversight based on the context of use and determined model risk.
- Adherence to standards AI technologies adhere to relevant legal, ethical, technical, scientific, cybersecurity, and regulatory standards, including Good Practices (GxP).
- Clear context of use AI technologies have a well-defined context of use (role and scope for why it is being used).
- Multidisciplinary expertise Multidisciplinary expertise covering both the AI technology and its context of use are integrated throughout the technology’s life cycle.
- Data governance and documentation Data source provenance, processing steps, and analytical decisions are documented in a detailed, traceable, and verifiable manner, in line with GxP requirements. Appropriate governance, including privacy and protection for sensitive data, is maintained throughout the technology’s life cycle.
- Model design and development practices The development of AI technologies follows best practices in model and system design and software engineering and leverages data that is fit-for-use, considering interpretability, explainability, and predictive performance. Good model and system development promotes transparency, reliability, generalizability, and robustness for AI technologies contributing to patient safety.
- Risk-based performance assessment Risk-based performance assessments evaluate the complete system including human-AI interactions, using fit-for-use data and metrics appropriate for the intended context of use, supported by validation of predictive performance through appropriately designed testing and evaluation methods.
- Life cycle management Risk-based quality management systems are implemented throughout the AI technologies’ life cycles, including to support capturing, assessing, and addressing issues. The AI technologies undergo scheduled monitoring and periodic re-evaluation to ensure adequate performance (e.g., to address data drift). >
- Clear, essential information Plain language is used to present clear, accessible, and contextually relevant information to the intended audience, including users and patients, regarding the AI technology’s context of use, performance, limitations, underlying data, updates, and interpretability or explainability.
These principles operationalize the core regulatory expectation: AI in drug development must be validated, documented, and maintained with the same rigor as any other regulated activity—but with additional attention to the unique characteristics of AI systems, including their data-dependency, potential opacity, and tendency to drift over time.
The FDA Risk-Based Credibility Framework
FDA’s January 2025 draft guidance “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products” establishes a seven-step risk-based framework for AI applications in drug development. While the guidance is not legally binding, it represents FDA’s expected approach for evaluating AI-generated evidence.
The FDA guidance applies when AI is used to produce information or data intended to support regulatory decision-making regarding safety, effectiveness, or quality. It does not apply to:
- AI used in drug discovery (before IND)
- AI used purely for operational efficiencies (e.g., internal workflows, resource allocation) that do not impact patient safety, drug quality, or study reliability
The Seven-Step Credibility Assessment Process:
- Define the Question of Interest: What specific question, decision, or concern is the AI model addressing?
- Define the Context of Use (COU): What is the specific role and scope of the AI model? Will other evidence be used alongside it?
- Assess Model Risk: Combine model influence (contribution of AI evidence relative to other evidence) with decision consequence (significance of adverse outcomes from incorrect decisions)—see Figure 23.2 for how common AI applications map onto this two-dimensional risk space
- Develop a Credibility Assessment Plan: Document model architecture, training data, evaluation methods, and performance metrics—with rigor proportional to model risk
- Execute the Plan: Implement the credibility assessment activities
- Document Results: Prepare a credibility assessment report with any deviations from the plan
- Determine Adequacy: Evaluate whether model credibility is sufficient for the COU
quadrantChart
title AI Model Risk Assessment
x-axis Low Model Influence --> High Model Influence
y-axis Low Decision Consequence --> High Decision Consequence
quadrant-1 High Risk
quadrant-2 Medium Risk
quadrant-3 Low Risk
quadrant-4 Medium Risk
"Patient stratification (sole determinant)": [0.85, 0.90]
"Eligibility screening support": [0.40, 0.50]
"Document classification": [0.30, 0.20]
"Data quality triage": [0.45, 0.35]
"Dosing recommendations": [0.75, 0.85]
"Protocol complexity scoring": [0.35, 0.30]
What You CAN and CANNOT Do with AI in Clinical Trials
The following table synthesizes enforceable requirements and regulatory expectations from FDA guidance, EMA reflection papers, and ICH E6(R3). Applications are categorized by risk level, with specific requirements and limitations for each.
| AI Application | Regulatory Status | Requirements | Limitations |
|---|---|---|---|
| Document classification and TMF filing | Permitted with oversight | Validate classification accuracy; maintain human review for inspection-critical documents | Cannot replace human accountability for TMF completeness |
| Data quality screening and query generation | Permitted with oversight | Document AI logic; human review of generated queries before sending to sites | Cannot auto-close queries without human verification |
| Patient matching/eligibility pre-screening | Permitted as decision support | Validate against eligibility criteria; investigator makes final determination | Cannot make final eligibility decisions—investigator responsibility under GCP |
| Protocol complexity scoring | Permitted | Document methodology; validate predictions against historical data | Operational tool only—no regulatory submission required |
| Site selection and feasibility | Permitted | Document data sources and model logic | Operational tool; sponsor retains responsibility for site qualification |
| Adverse event case processing | Permitted with lifecycle monitoring | Monitor model performance; human review of serious/unexpected cases | Cannot replace pharmacovigilance qualified person oversight |
| Statistical analysis endpoints | Permitted with pre-specification | Pre-specify in SAP; freeze model before database lock; prospective validation required for high-impact uses | Cannot modify model after unblinding; post hoc AI analysis is exploratory only |
| Primary endpoint assessment | Conditional—high regulatory scrutiny | EMA: prospective testing with newly acquired data required; FDA: credibility assessment proportional to risk | Model must be frozen and fully documented in SAP |
| Dosing/treatment assignment decisions | Conditional—requires early FDA/EMA engagement | Full credibility assessment; human-in-the-loop required; extensive safety monitoring | Cannot be sole determinant without validated safety controls |
| Replacing clinical judgment | Not permitted | — | AI supports decisions; investigators and physicians retain accountability under GCP |
Under ICH E6 GCP and FDA regulations, the investigator is responsible for medical decisions affecting trial participants, and the sponsor is responsible for trial conduct and data integrity. AI can support these decisions but cannot replace the accountable human. An investigator cannot defend a protocol deviation by stating “the AI said it was acceptable.”
Validation and Documentation Requirements
AI systems used in clinical trials must meet the same computerized systems validation requirements as any other regulated software. The applicable standards depend on the regulatory jurisdiction and the system’s role.
21 CFR Part 11 (FDA) and Annex 11 (EU) Requirements:
Both frameworks require that computerized systems generating, modifying, or storing electronic records for regulatory submissions meet validation and control standards:
- System validation: Documented evidence that software performs as intended, including AI model verification
- Audit trails: Computer-generated, time-stamped trails recording all system actions that create, modify, or delete electronic records
- Access controls: Unique user identification, authentication, and role-based permissions
- Data integrity: Controls ensuring data is attributable, legible, contemporaneous, original, and accurate (ALCOA+)
- Operational controls: Documented procedures for system use, maintenance, and change control
ICH E6(R3) Computerized Systems Requirements (January 2025):
The updated GCP guideline adds specific requirements relevant to AI:
- Fitness-for-purpose: Systems must be validated to be fit for the specific use in the trial
- Data governance: Dedicated section requiring documented data and records management
- Risk-based validation: Proportional approach based on impact on patient safety and data reliability
- Metadata and automated sources: Recognition of data from wearables, sensors, and automated systems as primary source data
| Requirement | 21 CFR Part 11 | Annex 11 | ICH E6(R3) |
|---|---|---|---|
| System validation | Required | Required | Required (risk-based) |
| Audit trails | Required | Required | Required |
| Access controls | Required | Required | Required |
| Change control | Required | Required | Required |
| Data backup/recovery | Required | Required | Required |
| Training documentation | Required | Required | Required |
| Supplier qualification | — | Required | Required |
AI Model Lifecycle Maintenance
Unlike static software, AI models may degrade or drift over time as input data distributions change. Both FDA and EMA guidance emphasize lifecycle maintenance for AI models deployed over extended periods.
When Lifecycle Maintenance is Required:
- AI models used in manufacturing (e.g., quality control, process optimization)
- AI models used in pharmacovigilance (e.g., case classification, signal detection)
- Any AI system where model performance may change with new data inputs
Lifecycle Maintenance Activities:
- Performance monitoring: Define metrics and thresholds; monitor on risk-based frequency
- Drift detection: Identify when input data diverges from training data distribution
- Revalidation triggers: Pre-define conditions requiring model retesting or retraining
- Change management: Evaluate all model changes through pharmaceutical quality system
- Regulatory notification: Report changes impacting model performance per applicable requirements (e.g., 21 CFR 314.70)
For pharmacovigilance applications, EMA permits a “more flexible approach” where incremental learning can continuously enhance models. However, the MAH retains responsibility to validate, monitor, and document model performance as part of the pharmacovigilance system.
When to Engage Regulators
Early engagement with FDA or EMA is strongly recommended for AI applications with high regulatory impact or high patient risk. The table below summarizes engagement pathways.
| AI Use Case | Recommended Engagement | FDA Contact | EMA Contact |
|---|---|---|---|
| Novel clinical trial design using AI | CDER C3TI or CID Meeting Program | CDERclinicaltrialinnovation@fda.hhs.gov | Innovation Task Force (ITF) |
| AI for endpoint evaluation | Drug Development Tools (DDT) or ISTAND | CDERBiomarkerQualificationProgram@fda.hhs.gov | SAWP qualification |
| AI-enabled digital health technology | DHT Program | DHTsforDrugDevelopment@hhs.fda.gov | — |
| AI in pharmacovigilance | Emerging Drug Safety Technology Program | AIMLforDrugDevelopment@fda.hhs.gov | PRAC interaction |
| AI in manufacturing | Emerging Technology Program (CDER ETP) | CDERETT@fda.hhs.gov | — |
| Model-informed drug development | MIDD Paired Meeting Program | MIDD@fda.hhs.gov | SAWP scientific advice |
EMA-Specific Considerations
The EMA reflection paper (September 2024) introduces terminology and expectations that differ somewhat from FDA:
- High patient risk: AI uses affecting patient safety (e.g., dosing, treatment assignment)
- High regulatory impact: AI uses substantially affecting regulatory decisions (e.g., primary endpoint analysis)
- Risk-based approach: Rigor of credibility assessment should be proportional to risk level
Key EMA Positions:
Transparent models preferred: “The use of transparent models is preferred” to strengthen accountability. Black box models may be acceptable if transparent models show unsatisfactory performance, with additional documentation and monitoring requirements.
Frozen models for pivotal trials: “Prior to the database lock and subsequent unblinding…the data pre-processing pipeline and all models should be frozen and documented in a traceable manner in the statistical analysis plan.”
Prospective validation for high-impact uses: “For inference in late-stage clinical development…performance should be tested with prospectively generated data (future calendar time) that is acquired in a setting or population representative of the intended context of use.”
No incremental learning in pivotal trials: “Incremental learning approaches are not accepted, and any modification of the model during the trial requires a regulatory interaction.”
Human-in-the-loop for precision medicine: AI-driven indication or posology recommendations are “high patient risk as well as high regulatory impact” and require “fall-back treatment strategies in cases of technical failure.”
Practical Implications: A Decision Framework
For sponsors evaluating whether and how to deploy AI in a clinical trial, Figure 23.3 provides a practical decision framework:
flowchart TD
A[Proposed AI Application] --> B{Does AI produce data/information<br/>for regulatory decisions?}
B -->|No - Operational only| C[Lower regulatory burden<br/>Document for inspection readiness]
B -->|Yes| D{What is the decision consequence<br/>if AI output is incorrect?}
D -->|Low| E[Low-risk application<br/>Standard validation<br/>Document methodology]
D -->|Medium/High| F{What is the AI model influence?}
F -->|Low - other evidence<br/>also used| G[Medium-risk application<br/>Proportional credibility assessment<br/>Human oversight of outputs]
F -->|High - AI is primary<br/>or sole evidence| H[High-risk application<br/>Full credibility assessment<br/>Early regulatory engagement<br/>Prospective validation<br/>Pre-specified in SAP]
C --> I[Proceed with GxP-compliant<br/>documentation and oversight]
E --> I
G --> J[Develop credibility assessment plan<br/>Consider regulatory feedback]
H --> K[Engage FDA/EMA before deployment<br/>Full 7-step credibility process]
Summary: The Regulatory Bottom Line
The regulatory framework for AI in clinical trials can be summarized in five principles:
AI supports, it does not replace: Human accountability for medical decisions and regulatory compliance cannot be delegated to AI systems. Investigators, sponsors, and MAHs retain responsibility.
Risk determines rigor: The rigor of validation, documentation, and oversight should be proportional to model risk—a combination of decision consequence and model influence.
Pre-specification is essential for confirmatory evidence: AI models used to generate evidence for regulatory submissions must be frozen and documented in the statistical analysis plan before database lock. Post hoc AI analysis is exploratory only.
Lifecycle maintenance is required for deployed models: AI systems operating over time (manufacturing, pharmacovigilance) require ongoing performance monitoring, drift detection, and change management.
Early engagement de-risks novel applications: For high-risk AI applications, early consultation with FDA or EMA can align expectations and prevent costly late-stage objections.
23.4 The Foundational “Backbone”: Platform Wars
For decades, clinical data lived in silos—spreadsheets here, PDFs there, fax machines everywhere. Today, unified platforms serve as the operating system for clinical research. The clinical trial platform market is dominated by a handful of enterprise players, with intense competition driving innovation (Medidata Solutions 2024a):
| Vendor | Primary Strengths | Market Position | Cloud Model | AI Capabilities |
|---|---|---|---|---|
| Medidata (Dassault) | Industry-standard EDC (Rave), 25-year track record, 36,000+ studies | Market leader in EDC | Cloud/SaaS | AI-powered signal detection, synthetic control arms |
| Veeva Systems | Unified Vault platform (eTMF, CTMS, EDC), life sciences focus | Fast-growing challenger | Cloud-native | TMF Intake Agent, Quality Check Agent |
| Oracle | Enterprise scale, Siebel Clinical One, regulatory expertise | Established incumbent | Cloud/On-prem | ML-based safety analytics |
| IQVIA | Real-world data integration, global CRO services | CRO-integrated platform | Cloud/SaaS | Intelligent eTMF, predictive enrollment |
Over 57% of new clinical trial system deployments are now cloud-based, up from 30% five years ago (International Data Corporation 2024). The pandemic accelerated this shift.
Major Platform Vendors
Medidata (acquired by Dassault Systemes in 2019) remains the dominant EDC platform, with its Rave EDC system recognized as the industry standard. The 2025 ISR Benchmarking Report ranked Rave EDC as the top-preferred EDC system based on independent sponsor evaluations. Medidata’s scale is substantial: over 700,000 certified site users, 1.8 million EDC users, and more than 36,000 studies managed across all phases and therapeutic areas (Medidata Solutions 2024b).
Medidata’s AI capabilities include Acorn AI, which provides synthetic control arms using historical patient data to reduce or eliminate placebo groups in certain trial designs. Their Sensor Cloud integrates wearable device data directly into the EDC, enabling continuous physiological monitoring without manual data entry.
Veeva has rapidly gained market share by offering a unified Vault platform that integrates eTMF, CTMS, and EDC in a single system—a contrast to Medidata’s historically modular approach. Veeva’s exclusive focus on life sciences (unlike Oracle or Salesforce, which serve multiple industries) has allowed deep specialization. Beyond its core platform, Veeva is deploying specialized AI Agents that automate the most tedious parts of clinical operations. Figure 23.4 illustrates the document processing workflow:
sequenceDiagram
participant Site as Site Upload
participant Intake as TMF Intake Agent
participant QC as Quality Check Agent
participant TMF as eTMF Vault
participant User as Document Manager
Site->>Intake: Upload document (PDF/scan)
Intake->>Intake: Extract metadata<br/>(investigator, date, type)
Intake->>Intake: Classify to DIA artifact
Intake->>QC: Route for quality check
QC->>QC: Check for signatures
QC->>QC: Validate completeness
alt Document Complete
QC->>TMF: File to correct binder
TMF->>User: Notification: "Document filed"
else Issues Found
QC->>User: Alert: "Missing signature"
User->>Site: Request correction
end
Veeva’s AI capabilities center on two key agents. The TMF Intake Agent automatically classifies documents uploaded by sites, extracting metadata such as investigator name and document date to route files to the correct TMF binder. The Quality Check Agent reviews documents for errors—missing signatures, wrong versions, incomplete forms—before a human ever sees them, reducing TMF backlog by up to 80% according to Veeva’s published benchmarks (Veeva Systems 2024a).
The eTMF market alone is worth $1.4 billion and growing at 12.8% annually (MarketsandMarkets 2024). Three vendors are competing for dominance:
| Feature | Veeva eTMF | IQVIA eTMF | Phlexglobal eTMF |
|---|---|---|---|
| Auto-Classification | AI-powered DIA mapping | ML-based indexing | Intelligent auto-filing |
| Completeness Prediction | Expected document lists | Milestone-based gaps | Risk-based prioritization |
| Inspection Readiness | Real-time dashboards | Inspection-ready reports | Audit trail analytics |
| Site Integration | SiteVault connected | Site-facing portal | Sponsor-site bridge |
| Unique Strength | Unified Vault ecosystem | RWD integration | eTMF-specialist focus |
These platforms use AI in two ways. First, auto-indexing uses machine learning models to classify unorganized scans into the DIA Reference Model structure. Second, completeness prediction algorithms identify missing documents based on study milestones—for example, flagging that “Site 101 has initialized but is missing a financial disclosure form” (IQVIA 2024; Phlexglobal 2024).
Medable took a different path—building for the decentralized trial from day one. As hybrid and virtual trials became mainstream, Medable’s modular platform enables patients to participate from home (Medable 2024a):
| Capability | What It Does | Impact |
|---|---|---|
| TeleVisit | Video conferencing for remote assessments | Reduced travel for suitable protocols (implementation-dependent) |
| eConsent | Multimedia-rich digital consent | Improved comprehension and workflow consistency (context-dependent) |
| Medable AI | Generates digital eCOA from paper protocols | Faster digitization and reuse of instruments (vendor-reported) |
| TMF Automation | Processes DCT-generated document flood | Helps manage higher document volume (protocol-dependent) |
23.5 AI across the Development Lifecycle
AI capabilities are now being applied across every phase of clinical development—from protocol design through recruitment, operations, and data management. This section examines specific tools and workflows at each stage, focusing on where automation delivers measurable value and where human oversight remains essential.
AI in Protocol Design
Benchmarking studies suggest that most clinical trials fail to meet planned enrollment timelines, often because of design choices baked in before the first patient is screened (Lamberti et al. 2024a). AI tools now address this problem at the protocol design stage, as illustrated in Figure 23.5.
flowchart LR
subgraph Input["Inputs"]
RWD[Real-World Data<br/>large cohorts]
Hist[Historical Trials<br/>Protocol library]
Reg[Regulatory Requirements]
end
subgraph AI["AI Analysis"]
Sim[Patient Simulation]
Burden[Burden Scoring]
Feasibility[Site Feasibility]
end
subgraph Output["Outputs"]
Protocol[Optimized Protocol]
SoA[Schedule of Activities]
Sites[Recommended Sites]
end
RWD --> Sim
Hist --> Burden
Reg --> Protocol
Sim --> Feasibility
Burden --> Protocol
Feasibility --> Sites
Protocol --> SoA
| Tool | Data Source | Primary Use Case | Key Metric |
|---|---|---|---|
| Faro Health | Protocol library + RWD | Operational burden prediction | Complexity score |
| Phesi | 100M+ patient profiles | Enrollment simulation | Patient availability |
| Medidata AI | Historical trial data | Protocol optimization | Predicted enrollment rate |
| TrialSpark | Site network data | Site selection | Per-site enrollment probability |
Faro Health exemplifies this new approach. Instead of writing a static Word document, study teams design the trial in a structured cloud platform. The AI analyzes protocol complexity against real-world data to predict operational burden by scoring the complexity of the schedule of assessments, visualize patient burden by identifying visits that require too many procedures, and generate documents by automating protocol document and Schedule of Activities creation (Faro Health 2024).
Phesi takes simulation even further, leveraging data from over 100 million patients to model trial outcomes before finalizing the protocol. Their Digital Patient Profile reduces the likelihood of the “zero-enrollment site” problem mentioned in Chapter 16.
AI in Recruitment and Site Selection
Finding the right patients remains the perennial bottleneck. Benchmarking studies suggest that recruitment and retention often take substantially longer and cost more than planned, and that each day of delay can be economically material for sponsors in some therapeutic areas (Lamberti et al. 2024b; Deloitte Centre for Health Solutions 2025). AI tools now scan electronic health records to find “needle in the haystack” candidates—and, when integrated into workflow, can reduce the manual effort required to identify potentially eligible participants. Figure 23.6 shows the typical pipeline:
flowchart LR
subgraph Sources["Data Sources"]
EHR[EHR Systems]
Claims[Claims Data]
Labs[Lab Results]
Genomics[Genomic Profiles]
end
subgraph NLP["NLP Processing"]
Extract[Entity Extraction]
Normalize[Terminology Normalization]
Temporal[Temporal Reasoning]
end
subgraph Match["Matching Engine"]
Criteria[I/E Criteria Parser]
Score[Eligibility Scoring]
Rank[Patient Ranking]
end
subgraph Output["Results"]
Patients[Matched Patients]
Sites[Optimized Sites]
Alerts[Provider Alerts]
end
EHR --> Extract
Claims --> Extract
Labs --> Normalize
Genomics --> Normalize
Extract --> Criteria
Normalize --> Criteria
Temporal --> Score
Criteria --> Score
Score --> Rank
Rank --> Patients
Patients --> Sites
Sites --> Alerts
| Vendor | Technology | Data Assets | Best For |
|---|---|---|---|
| NextTrial.ai | NLP + ML matching | EHR integration | Complex I/E criteria |
| H1 | KOL mapping, investigator analytics | Publications, claims, trials | Site selection, investigator finding |
| Deep 6 AI | Real-time EHR search | Health system partnerships | Oncology, rare disease |
| TriNetX | Federated network | 400M+ patient records | Global feasibility |
| Komodo Health | Healthcare map | Claims + RWD | Patient path analysis |
NextTrial.ai uses natural language processing to bridge the gap between protocol criteria and patient records. The platform ingests unstructured data—reading clinician notes, pathology reports, and genetic profiles—then matches patients by automatically flagging those who meet complex inclusion/exclusion criteria, and optimizes site selection by predicting which investigator sites have the highest density of eligible patients (NextTrial 2024).
H1 takes a different approach, mapping the global network of Key Opinion Leaders and investigators. By analyzing billions of data points—publications, claims data, and clinical trial records—H1 helps sponsors find investigators who are not just academically prominent but actively treating the target patient population (H1 2024).
AI in Clinical Operations and Data Quality
Once data starts flowing, it must be cleaned. As data volume increases (wearables generate 1,000+ data points per patient per day), manual query resolution becomes unsustainable. In practice, the emerging approach layers multiple AI techniques across the data pipeline, each suited to different types of quality problems.
Rule-based checks remain the foundation: programmed validations that flag impossible values (negative ages, dates in the future) or logical inconsistencies (randomization before consent). These deterministic checks are fast, auditable, and well-understood.
Machine learning anomaly detection adds statistical pattern recognition. ML models trained on historical trial data can identify unusual distributions, digit preferences suggestive of fabrication, or sites whose data patterns diverge from comparators. Unlike rule-based systems, ML can surface problems that were not anticipated during study design.
Large language models (LLMs) contribute to discrepancy analysis—parsing free-text fields in adverse event narratives, medical history, or concomitant medication entries to identify inconsistencies that would be invisible to structured checks. An LLM might flag that a narrative describes “chest pain radiating to left arm” while the coded adverse event is “headache.”
Cross-domain validation links data across sources: comparing EDC entries against central lab results, device uploads against visit schedules, and ePRO responses against clinical observations. Discrepancies across domains often indicate transcription errors or protocol deviations.
The output is a set of automated actions: auto-generated queries with draft text for site response, risk flags that prioritize monitoring attention, trend alerts that surface site-level quality patterns, and quality reports for oversight review (Figure 23.7). The goal is not to replace data management staff but to reduce the time spent on low-value triage and increase the proportion of effort spent on judgment-intensive resolution.
flowchart LR
subgraph Ingest["Data Ingestion"]
EDC[EDC Data]
Devices[Device Data]
Labs[Central Labs]
ePRO[ePRO Responses]
end
subgraph AI["AI Processing"]
Rules[Rule-Based Checks]
ML[ML Anomaly Detection]
LLM[LLM Discrepancy Analysis]
Cross[Cross-Domain Validation]
end
subgraph Actions["Automated Actions"]
AutoQuery[Auto-Generated Queries]
Flag[Risk Flags]
Trend[Trend Alerts]
Report[Quality Reports]
end
EDC --> Rules
Devices --> ML
Labs --> Cross
ePRO --> LLM
Rules --> AutoQuery
ML --> Flag
LLM --> AutoQuery
Cross --> Trend
Flag --> Report
Trend --> Report
| Tool | Primary Approach | Best Feature | Integration Depth |
|---|---|---|---|
| Saama | Clinical Command Center | Unified EDC/CTMS/eTMF view | Deep multi-system |
| Octozi | LLM-based discrepancy detection | Natural language queries | EDC-focused |
| Veeva CDB | Vault-native data management | Single-platform simplicity | Veeva ecosystem |
| Medidata Detect | Signal detection algorithms | Safety signal identification | Rave ecosystem |
Saama provides an AI-driven Clinical Command Center. Their platform unifies data from EDC, CTMS, and eTMF to provide an integrated view of trial health. Saama’s AI models, trained on more than 300 million data points, predict site non-compliance and enrollment delays, enabling intervention before problems become critical (Saama Technologies 2024).
Octozi applies Large Language Models to automate data review. The platform performs automated discrepancy detection, scanning for inconsistencies such as “male patient listed as pregnant” or “adverse event date before informed consent.” By identifying these issues instantly, Octozi eliminates the need for line-by-line manual review that traditionally consumed data management resources (Octozi 2024).
23.6 Emerging Methodologies and Operational Solutions
Innovation in clinical trials is not limited to software; it includes new methodological frameworks that leverage real-world data, federated learning, and digital simulations to rethink the evidence-generation process.
Solving Logistical Challenges with IT
Modern trials face significant logistical friction (see Section 18.6.1). The challenges are structural: supply chains that must deliver temperature-sensitive products to thousands of endpoints, sites overwhelmed by redundant data entry across disconnected systems, protocols too complex for manual compliance tracking, and global operations fragmented across time zones and organizations. These are coordination problems, and coordination problems respond to better information infrastructure.
Supply chain complexity has intensified with decentralized trials and direct-to-patient distribution. Traditional approaches relied on spreadsheets and email to track inventory and shipments; delays were discovered after the fact. Modern systems integrate IoT sensors that monitor temperature, location, and chain of custody in real time. AI-based demand forecasting uses enrollment trajectories and visit schedules to predict resupply needs before stockouts occur. The result is reduced waste (fewer expired products) and fewer patient visits disrupted by supply failures.
Site burden accumulates when coordinators must enter the same data into multiple systems—pulling information from the electronic health record, transcribing it to the EDC, and reconciling discrepancies later. EHR-to-EDC integration automates the flow of structured data (lab values, vital signs, demographics) from the source system to the trial database, reducing transcription errors and freeing coordinator time for patient-facing work. This integration requires careful validation and mapping, but when implemented well, it addresses one of the most persistent complaints from clinical sites.
Protocol deviations often result from complexity: too many visits, too many procedures, too many eligibility criteria for staff to track manually. Real-time nudge systems monitor visit windows and upcoming assessments, alerting coordinators before a deviation occurs rather than flagging it retrospectively. Detection systems analyze patterns across sites to identify systematic compliance failures that may indicate training gaps or protocol design problems.
Global operations compound all of these challenges. When a trial spans 25 countries, 80 sites, and multiple CRO partners, fragmented regional teams operating from different data sources create reconciliation overhead and conflicting reports. Unified command centers aggregate operational data into a single platform, providing consistent metrics, standardized escalation pathways, and a shared view of trial status across all stakeholders.
| Challenge | Traditional Approach | Modern IT Solution | Reported Impact |
|---|---|---|---|
| Supply Chain | Spreadsheets, email | IoT sensors, AI forecasting | Reduced waste, fewer stockouts |
| Site Burden | Redundant data entry | EHR-to-EDC integration | Less transcription time |
| Protocol Deviations | Retrospective monitoring | Real-time nudge engines | Fewer major deviations |
| Global Operations | Fragmented regional teams | Unified command centers | Single source of truth |
Novel Methodological Approaches
This section covers a set of approaches that challenge traditional assumptions about how evidence is generated and compared: using observational data to support decisions (RWE), using historical or synthetic comparators to reduce concurrent control enrollment, and using models to predict counterfactual outcomes. All remain subjects of active regulatory and scientific debate.
Clinical research data are fragmented across institutions, each with privacy regulations, competitive interests, and technical barriers to sharing. Federated learning addresses this by training models collaboratively without centralizing data: algorithms travel to data sources, train locally, and share only model updates (gradients or weights) rather than patient records. Owkin has applied federated learning to construct external control arms from real-world data distributed across hospital networks. Their 2025 publication in Nature Communications demonstrated federated external control arms for oncology trials, enabling international collaboration while maintaining GDPR and HIPAA compliance (Owkin 2024). The practical value is clearest in rare diseases and oncology, where recruiting concurrent controls may be infeasible or ethically questionable. However, the approach requires sophisticated infrastructure, standardized data formats across institutions, and careful attention to the comparability of patient populations.
The term digital twin originated in engineering, where virtual models of physical systems enable simulation and optimization. In clinical research, digital twins are computational models that predict an individual patient’s disease trajectory under control conditions, based on baseline characteristics and historical data from similar patients (Laubenbacher, Sluka, and Glazier 2021).
Unlearn.AI’s PROCOVA methodology exemplifies this approach. Their models—trained on historical patient data—generate prognostic scores that predict each patient’s likely outcome if assigned to the control arm. These scores are then used as covariates in the primary analysis, reducing residual variance and enabling smaller control groups while maintaining unbiased treatment effect estimates and Type I error control (Unlearn.AI 2024).
The European Medicines Agency issued a favorable qualification opinion for PROCOVA in September 2022—the first regulatory endorsement of a machine-learning method for sample size reduction in pivotal trials. In January 2024, FDA confirmed that PROCOVA does not deviate from current statistical guidance and is an acceptable methodology. In favorable settings, the approach can reduce control arm sizes while maintaining error control, allowing more participants to receive experimental treatment (Unlearn.AI 2024).
Real-World Evidence and Regulatory Considerations
The randomized controlled trial remains the gold standard for establishing causal treatment effects, but real-world evidence (RWE)—derived from observational data collected outside the controlled trial setting—is playing an expanding role in drug development and regulatory decision-making.
Real-world data (RWD) includes:
- electronic health records
- claims databases
- disease registries
- wearable devices
- patient-generated data
When analyzed appropriately, RWD can generate real-world evidence about drug safety, effectiveness, and utilization patterns. The distinction matters: RWD is the raw data; RWE is the clinical evidence derived from it through rigorous analysis (U.S. Food and Drug Administration 2018).
FDA has articulated when RWE may support regulatory decisions. For safety, RWE has long been used for post-marketing surveillance—detecting rare adverse events that trials were not powered to identify. For effectiveness, FDA acceptance is more cautious but expanding. The 21st Century Cures Act directed FDA to evaluate RWE for approving new indications for existing drugs and for satisfying post-marketing study requirements. FDA’s 2023 final guidance on RWE for regulatory decisions emphasized that data relevance, reliability, and analytic rigor determine acceptability—not the mere existence of a large dataset (U.S. Food and Drug Administration 2023c).
The practical applications span a spectrum. Single-arm trials with external controls compare treated patients against matched historical or concurrent observational cohorts—most defensible in diseases with well-characterized natural history and no effective standard of care. Hybrid designs randomize a reduced control arm while borrowing information from external data to increase statistical precision. Post-marketing effectiveness studies use RWD to assess whether efficacy demonstrated in controlled trials translates to real-world populations with comorbidities and concomitant medications excluded from pivotal trials.
The core challenge is confounding: in observational data, treatment selection is not random. Patients who receive a therapy may differ systematically from those who do not, and these differences—rather than the treatment itself—may explain observed outcomes. Propensity score methods, instrumental variables, and target trial emulation frameworks attempt to address confounding, but none can fully substitute for randomization. Unmeasured confounders remain a fundamental limitation.
RWE is most credible when: the outcome is objective and reliably captured in routine care; the comparison is against natural history rather than an active comparator; the patient population in the RWD source is demonstrably similar to the trial population; and sensitivity analyses show robustness to plausible unmeasured confounding. Even then, regulators typically view RWE as supportive rather than dispositive for efficacy claims—strengthening a submission that includes randomized evidence rather than replacing it.
FDA’s 2023 guidance on externally controlled trials provides a framework for using real-world data to construct external control arms, addressing data quality, comparability, and bias mitigation strategies (U.S. Food and Drug Administration 2023b). The guidance emphasizes that external controls are most appropriate when concurrent randomization is infeasible or unethical, and when the disease has a well-characterized natural history with reliable outcome measurement in routine care.
Synthetic control arms (SCAs) construct external comparators from patient-level data in historical trials rather than recruiting new control patients. Medidata’s platform draws on a database spanning over 36,000 trials and 11 million patients to statistically match historical controls to current trial populations.
In October 2020, FDA approved a precedent-setting hybrid SCA for Medicenna Therapeutics’ Phase III trial in recurrent glioblastoma—the first acceptance of a hybrid external control in a registrational trial for an indication that previously required traditional 1:1 randomization. The approach reduced prospective control enrollment by approximately two-thirds (Medidata Solutions 2024c).
SCAs are most applicable in rare or life-threatening diseases with inadequate standard-of-care, where historical control data are robust and disease progression is well-characterized. FDA has been most receptive in early-phase development and single-arm trials, with hybrid models (combining historical and concurrent controls) gaining acceptance for later stages.
These technologies generate both enthusiasm and legitimate skepticism. Several challenges merit attention.
Verification, validation, and uncertainty quantification (VVUQ) remain incompletely standardized. A 2025 review in npj Digital Medicine emphasized that VVUQ frameworks are essential for safety and efficacy but vary widely across implementations, hampering regulatory evaluation and clinical adoption (Sel et al. 2025).
Model opacity is a recurring concern. Berry Consultants has argued that PROCOVA is essentially an extension of classical covariate adjustment—substituting a proprietary neural network for transparent regression models applied to the same baseline data. When model details are withheld, sponsors and regulators cannot interrogate, replicate, or improve the methodology. As statistician Scott Berry has noted: “I highly doubt in most scenarios that…this is actually better” than standard covariate adjustment with the same data.
Data quality and completeness constrain all approaches. In oncology, the data needed to model tumor dynamics are often noisy, incomplete, and subject to collection burden (Venkatesh, Raza, and Kvedar 2022). Statistical models introduce their own uncertainty, particularly when predictions must generalize across populations and disease subtypes.
Uncertainty quantification is essential but often underemphasized. A single predicted outcome per patient is inadequate; scientifically defensible use requires generating full distributions of potential outcomes—what Berry calls “digital googols” rather than digital twins—to capture the uncertainty inherent in counterfactual prediction.
Comparability assumptions underpin all external control approaches. If the historical population differs systematically from the current trial population—due to changes in standard of care, patient selection, or measurement practices—treatment effect estimates may be biased regardless of the sophistication of the matching algorithm.
FDA has signaled that digital twins and AI-assisted trial design will receive significant regulatory oversight when they directly affect trial conduct and interpretation. In a January 2025 JAMA publication, FDA Commissioner Robert Califf and senior officials described digital twins as an area of active concern, distinct from lower-risk AI applications like patient matching or data cleaning that receive lighter scrutiny (Warraich, Tazbaz, and Califf 2024).
The regulatory path forward will likely require sponsors to demonstrate not only statistical validity but also model transparency, reproducibility, and appropriate characterization of uncertainty. Technologies that rely on opaque, proprietary methods may face higher evidentiary bars than those built on interpretable, well-documented approaches.
| Approach | How It Works | Regulatory Status | Key Limitation |
|---|---|---|---|
| Federated Learning | Models train locally; only updates shared | EMA letter of support; FDA engagement | Infrastructure complexity |
| Digital Twins (PROCOVA) | Prognostic scores as covariates | EMA qualified (2022); FDA acceptable (2024) | Model opacity; uncertainty |
| Synthetic Control Arms | Historical data matched to trial | FDA approved hybrid Phase III (2020) | Comparability assumptions |
23.7 Future Impact and Economics
The convergence of AI and digital platforms is shifting the fundamental cost drivers of clinical development, potentially reducing cycle times and sample sizes while introducing new operational complexities.
AI Impact on Trial Operations
The technologies described in this chapter have the potential to reshape the economics of clinical development. Recall the economics from Chapter 6: $2.3 billion to bring a drug to market, $40,000 per day to operate a Phase III trial, and $600,000 to $1.3 million per day in opportunity cost from delays (DiMasi, Grabowski, and Hansen 2016; Deloitte Centre for Health Solutions 2025). AI-enabled tools are being applied to each of these cost drivers, though the realized impact depends on implementation quality, governance, and the specific operational context.
| Current Cost Structure | AI Interventions | Impact Estimates |
|---|---|---|
| \(approx\)$2–3B to market (est.) | Protocol Optimization AI | Reduced Development Cost |
| High attrition ($approx\(90%) | Digital Twins & Synthetic Arms | Smaller Trials, Similar Power | | High daily burn (\)approx\(\$40K/day) | Autonomous Agents | Increased Automation | | CRO margins (\)approx\(15--25%) | Direct Automation | Margin Pressure | | Time to first patient (\)approx$166d avg) | Parallel Processing AI | Faster Timelines |
One way to make the economics concrete is to translate “cost to market” into a small set of dominant drivers: late-stage attrition, enrollment delays, protocol amendments, monitoring intensity, and the long tail of data cleaning. The question is not whether automation can eliminate scientific uncertainty, but whether it can shift these operational drivers enough to change the capitalized cost of development in a meaningful way. The table below summarizes the main levers discussed in this chapter in a format that mirrors how sponsors often reason about cost: what drives it today, what kind of automation or analytics is proposed, and what types of impact are plausibly expected.
| Cost Driver | Current State | AI Approach | Potential Impact |
|---|---|---|---|
| Failed Programs | ~90% of drugs do not reach market | Protocol simulation, patient selection AI | Earlier go/no-go decisions; potentially fewer late-stage failures |
| Enrollment Delays | Most trials miss enrollment targets | NLP-based patient matching, predictive site selection | Faster enrollment (magnitude varies by indication) |
| Data Cleaning | Manual query resolution at $50-100/query | ML-assisted triage and draft resolutions | Reduced query burden |
| Monitoring Costs | 9-14% of CRO budget on site visits | Risk-based + AI-detected anomalies | Reduced on-site visit frequency |
| Protocol Amendments | $500K+ per substantial amendment | Design simulation before finalization | Potentially fewer amendments |
CROs currently command gross margins of 40-50% (with operating margins typically 6-16%) on top of direct costs (see Chapter 6). AI-enabled tools are increasingly capable of performing tasks that have traditionally justified those margins—document processing, query management, routine monitoring, and medical coding. As sponsors develop their own AI capabilities, they may demand either lower CRO fees or differentiated services that require domain expertise AI cannot replicate. The speed and extent of this shift remain uncertain.
The “daily burn” of operating a trial is largely a labor-and-coordination cost: monitors reviewing data, coordinators entering and reconciling information, medical writers preparing narrative sections, and project managers coordinating activities across teams and vendors. One way to analyze how automation might change this cost is to decompose spend by role and task category, recognizing that some activities are amenable to partial automation (for example, drafting or triage) while others remain inherently judgment- and accountability-driven. The table below provides an illustrative decomposition.
| Role | Current Cost Contribution | AI Automation Potential | Timeline |
|---|---|---|---|
| Data Manager | Query generation, cleaning | 80% automatable | Now - 2027 |
| Clinical Monitor | Source verification, oversight | 50% automatable (remote SDV) | 2025 - 2028 |
| Medical Writer | Narratives, CSR sections | 60% draft automation | Now - 2026 |
| Project Manager | Status tracking, reporting | 40% automatable | 2026 - 2029 |
| Regulatory Affairs | Submission compilation | 70% automatable | 2025 - 2027 |
In principle, if automation reduces repetitive coordination work and shortens the reconciliation tail, the operating cost per day could decline. The magnitude and reliability of any reduction depend on protocol complexity, data sources, and the quality of implementation and oversight.
The logistics challenges described in Chapter 18—multi-country execution, dozens of sites, long startup timelines, and tightly constrained supply chains—are coordination problems in a precise sense: progress depends on many interdependent tasks that are distributed across organizations, time zones, and systems of record. Delays rarely arise from a single missing document or a single late shipment; they arise from cascades, such as a contract delay that postpones site activation, which shifts enrollment curves, which changes drug demand forecasts, which increases the risk of stockouts or wastage. At the same time, oversight requirements impose constraints: actions must be traceable, exceptions must be reviewable, and accountability cannot be delegated to an opaque process.
In that environment, “agentic” automation is most defensible when it behaves like structured orchestration rather than free-form autonomy (Figure 23.8). The practical contributions are to maintain state across workflows, to triage and route exceptions, to generate standardized artifacts (for example, draft correspondence or document metadata), and to propose next actions that a responsible owner can approve. Used this way, orchestration can reduce coordination overhead by turning scattered operational signals into a prioritized work queue with clear provenance, while leaving high-stakes decisions—such as changes that affect participant safety, protocol interpretation, or regulatory commitments—under explicit human control.
flowchart LR
subgraph Orchestrator["Trial Orchestration Agent"]
Master[Master Coordinator]
end
subgraph Functional["Functional Agents"]
Site[Site Activation Agent]
Enroll[Enrollment Agent]
Supply[Supply Chain Agent]
Data[Data Quality Agent]
Safety[Safety Monitoring Agent]
Doc[Document Agent]
end
subgraph Actions["Autonomous Actions"]
A1[Generate site contracts]
A2[Match patients to criteria]
A3[Predict inventory needs]
A4[Resolve queries]
A5[Draft safety narratives]
A6[File TMF documents]
end
Master --> Site
Master --> Enroll
Master --> Supply
Master --> Data
Master --> Safety
Master --> Doc
Site --> A1
Enroll --> A2
Supply --> A3
Data --> A4
Safety --> A5
Doc --> A6
Site activation—getting from protocol approval to first patient enrolled—currently averages 166 days for Phase III trials (Lamberti et al. 2024a). The delays are distributed across sequential steps: feasibility assessment, contract negotiation, IRB/EC submission, site training, and investigational product shipment. Each step has its own bottleneck, and the steps are typically processed in sequence rather than in parallel.
AI-enabled tools can address each bottleneck. Predictive site scoring, trained on historical enrollment data, can replace manual feasibility questionnaires. Contract generation from standardized templates can reduce legal negotiation cycles. Auto-populated submission packages can accelerate IRB preparation. Adaptive e-learning modules can replace in-person training for routine content. Predictive supply chain systems can coordinate IP shipment with site readiness.
The potential impact is not merely faster execution of each step but a shift from sequential to parallel processing—initiating multiple activation workstreams simultaneously rather than waiting for each to complete. The realized gains depend on integration quality and the extent to which sites and sponsors adopt common platforms.
Humans and organizations often process activation tasks sequentially. Software can execute some workstreams in parallel—such as assembling submission-ready document sets or preparing templated contract packets—while still requiring review, signature, and site-specific customization. Parallelization can reduce elapsed time, but the magnitude depends on local ethics processes, contracting norms, and the degree of platform standardization.
The rise of decentralized trials (DCTs) creates a logistics challenge of much greater scale: instead of shipping investigational product to a limited number of sites, sponsors may ship to thousands of patient homes and coordinate services across a wider set of vendors and geographies (U.S. Food and Drug Administration 2024). In practice, this pushes organizations toward greater automation and analytics, not because human oversight becomes unnecessary, but because routine coordination tasks and exception handling can otherwise overwhelm operational teams. The relevant question is therefore not “automation or not,” but which components can be standardized and monitored with traceable workflows while keeping safety- and compliance-critical decisions under accountable review.
| DCT Challenge | Traditional Approach | AI-Enabled Approach |
|---|---|---|
| Cold-Chain Monitoring | Periodic temperature logs | Real-time IoT + predictive intervention |
| Home Nursing Coordination | Manual scheduling | AI-assisted routing and scheduling |
| Patient Adherence | Site calls, diaries | Wearable + app AI detecting non-adherence patterns |
| Document Collection | Email reminders, faxes | TMF agents that chase, collect, and file automatically |
| Regulatory Compliance | Per-country manual review | Multi-jurisdiction compliance AI |
Economics and Business Models
Sponsors face a strategic choice: invest in building AI capabilities, purchase from vendors, or partner with AI-native CROs. Each path has different economic implications:
| Strategy | Upfront Investment | Ongoing Cost | Risk | Best For |
|---|---|---|---|---|
| Build Internal AI | $10-50M+ | High (talent, compute) | Technology obsolescence | Top 20 pharma |
| Buy Platform AI | $1-5M licensing | Medium (per-trial fees) | Vendor lock-in | Mid-size sponsors |
| Partner with AI-Native CRO | Minimal | Higher per-trial cost | Dependency on partner | Biotech, small sponsors |
| Hybrid Model | $5-20M | Medium | Complexity | Most large sponsors |
The 2030 Trial: A Vision
By 2030, a Phase III trial could look different in organizations that successfully integrate automation into validated workflows. The core idea is not that decision-making becomes “fully autonomous,” but that a larger fraction of routine coordination work becomes standardized, traceable, and partially automated: documents are assembled from structured sources rather than rewritten, operational signals are reconciled across systems of record rather than re-keyed into decks, and exceptions are routed with clear provenance rather than discovered late through periodic reconciliation.
This kind of operating model is easiest to imagine in contexts where three prerequisites are met. First, data capture and integration are mature enough that key events (screen failures, visit-window risk, temperature excursions, missing essential documents) appear as machine-readable signals. Second, organizations invest in validation, access controls, and audit trails so that automated steps remain inspectable and accountable under GCP and 21 CFR Part 11 expectations. Third, sites and vendors actually adopt the workflow, so that automation reduces burden rather than shifting it into more portals and alerts. With those constraints in mind, the scenario in Figure 23.9 is intentionally aspirational: it is meant to illustrate where cycle time and coordination cost might be reduced, not to predict a single inevitable trajectory.
timeline
title The 2030 AI-Native Trial Timeline
section Design (weeks)
Protocol AI : Simulates large virtual cohorts
: Optimizes I/E criteria automatically
: Generates protocol document
section Activation (weeks)
Site AI : Identifies high-performing sites
: Generates contracts, submits to IRBs in parallel
: Ships IP with predictive inventory
section Enrollment (months)
Matching AI : Scans large EHR cohorts
: Alerts physicians of eligible patients
: Patients consent via eConsent platform
section Conduct (months)
Operations AI : Monitors data in real-time
: Generates and resolves queries automatically
: Predicts and prevents protocol deviations
section Close (weeks)
Reporting AI : Locks database with automated reconciliation
: Generates statistical outputs
: Drafts Clinical Study Report
| Metric | 2025 Benchmark | 2030 Projection | Change |
|---|---|---|---|
| Time to First Patient | 166 days | 60 days | -64% |
| Enrollment Duration | 500+ days | 200 days | -60% |
| Daily Operational Cost | $40,000 | $18,000 | -55% |
| Total Phase III Cost | $100M | $45M | -55% |
| Human FTEs per Trial | 50-100 | 15-30 | -70% |
| Data Quality Issues | 15-20% query rate | 3-5% query rate | -75% |
Industry and Workforce Implications
The impact of AI adoption varies by stakeholder because costs, responsibilities, and bargaining power are unevenly distributed across the clinical trial enterprise. Some organizations bear the largest absolute operational costs and therefore capture most of the direct savings from efficiency improvements; others earn revenue precisely from the labor-intensive activities that automation targets, making the same improvements economically disruptive. Differences in regulatory exposure also matter: actors responsible for compliance, inspection readiness, and patient protection face higher requirements for validation, audit trails, and accountability, which can slow adoption and increase the fixed costs of implementation.
In practice, the distributional effects depend on where automation is deployed (site workflows, sponsor oversight, CRO service delivery, or platform infrastructure) and on whether new tools reduce burden or simply shift it across organizational boundaries. The table below summarizes these incentives at a high level.
| Stakeholder | Likely Position | Rationale |
|---|---|---|
| Large Sponsors | Positioned to benefit | Can afford AI investment; may capture efficiency gains if governance is effective |
| Biotech | Mixed | May benefit from lower trial costs, but less capital to invest in AI infrastructure |
| Traditional CROs | Facing pressure | Core services may be commoditized; business models may need to evolve |
| AI-Native CROs | Potentially advantaged | Built for automation; lower cost structure if quality can be demonstrated |
| Clinical Sites | Mixed | Less manual work, but also potentially less revenue per patient |
| Patients | Potentially benefit | May see faster access to trials, less burden, more decentralized options |
| Regulators | Adapting | Must develop AI validation frameworks while maintaining safety standards |
The clinical research workforce—coordinators, monitors, data managers—faces significant disruption. Some work will shift toward roles focused on system implementation, validation, and oversight, while some routine tasks may be reduced through automation. The industry will likely need deliberate reskilling and role redesign to preserve domain expertise as workflows change.
The clinical trial technology ecosystem is moving in two directions at once. On the one hand, core systems are consolidating around a small number of platforms that can support regulated, end-to-end execution across EDC, CTMS, eTMF, and supply. On the other hand, sponsors are experimenting with a growing set of analytics and automation capabilities that sit “on top” of those systems and reshape day-to-day work in feasibility, monitoring, data cleaning, and documentation. The result is less a single wave of disruption than a gradual reallocation of investment: stability and standardization in the transactional backbone, with faster iteration at the analytics and workflow layer (Tufts Center for the Study of Drug Development 2024; Fortune Business Insights 2024):
| Segment | 2025 Market Size | 2030 Projection | CAGR | Key Trend |
|---|---|---|---|---|
| Clinical Trial Software | $2.4B | $4.9B | 15% | Cloud consolidation |
| eTMF Systems | $1.4B | $2.5B | 13% | AI-powered automation |
| AI in Clinical Trials | $3.8B | $55B | 46% | Rapid adoption |
| EDC Systems | $1.2B | $2.1B | 12% | EHR integration |
The IT stack has evolved from a passive repository to an active intelligence layer. When sponsors can integrate design, recruitment, and operations data in near real time, they can often detect operational risks earlier and reduce cycle time compared with fragmented legacy workflows.
Sponsors who adopt these technologies may be able to run trials faster and at lower operational cost, although the magnitude of improvement depends on disease area, protocol complexity, and the quality of implementation and governance.
A practical way to evaluate any “innovation” in this space is to ask three questions. First, what decision does it change, and what evidence shows that the change improves patient safety, data integrity, or development efficiency in the target context? Second, what are the failure modes, and how are they detected, documented, and corrected under GCP and inspection expectations? Third, how does it integrate into real workflows—including incentives across sponsors, CROs, and sites—so that it reduces burden rather than shifting it. Tools that answer those questions well are more likely to be adopted, defended, and sustained in regulated clinical research.