23 AI in Trials

This chapter examines how technology and methodological innovation are reshaping clinical trials. The scope is intentionally broad, covering three interconnected developments: the IT infrastructure that now underpins trial operations, the AI and automation tools being layered on top of that infrastructure, and the emerging methodological approaches—real-world evidence, digital twins, synthetic controls—that are changing how trials are designed and how evidence is generated.

These topics belong together because they share a common theme: the shift from trials as primarily manual, paper-based endeavors to trials as data-intensive, computationally-mediated systems. The IT platforms determine what data can be captured and integrated. AI tools determine what can be automated or predicted. Methodological innovations determine what kinds of evidence regulators will accept. A decision-maker evaluating a trial strategy must understand all three.

The chapter proceeds as follows. We begin with the technology ecosystem—the core platforms (EDC, CTMS, eTMF, RTSM) that form the transactional backbone of modern trials, and the market dynamics shaping their evolution. We then examine AI applications across the trial lifecycle: study design and protocol optimization, patient recruitment and site selection, and operational data quality. Next, we address emerging methodological approaches—real-world evidence, federated learning, digital twins, and synthetic control arms—that represent alternatives or complements to traditional randomized control designs. Finally, we consider the economic and organizational implications: how these technologies may reshape cost structures, CRO relationships, and the clinical research workforce.

Throughout, we maintain a realistic perspective. Technology can reduce friction and surface risks earlier, but it requires validation, governance, and oversight to be defensible in an inspection. Methodological innovations can improve efficiency, but they introduce new sources of uncertainty and require careful regulatory engagement. The goal is to help readers understand not only what is changing, but how to evaluate whether a given tool or approach is appropriate for regulated clinical research.

Clinical trials now run on a layered digital infrastructure: core transactional systems (EDC, CTMS, eTMF, RTSM), integration middleware, and analytics/automation services that continuously convert operational events into risk signals and decisions. This shift has been accelerated by cloud adoption and by regulatory acceptance of technology-enabled trial conduct, including decentralized trial elements and digital health technologies for remote data acquisition (U.S. Food and Drug Administration 2024, 2023d).

The next transition is conceptual: from automation as a feature (dashboards, rules engines, and isolated machine-learning models) to automation as an operating model, in which software agents can plan, execute, and verify multi-step workflows across the clinical operations stack. This shifts the trial from a linear “pipeline” towards a dynamic, electronic map of activities, where information infrastructure identifies bottlenecks and navigates the “diverse web of iterative learning loops” that characterize modern drug development (Wagner et al. 2018). In this chapter, “agentic AI” is treated as a design pattern—built from “compound” systems that combine models with retrieval, tools, and control logic—grounded in foundational computer science work on “generative agents” that simulate human-like behavior and goal-directed action (Zhang, Chen, and Oney 2023; Berkeley Artificial Intelligence Research 2024). Recent systematic reviews of studies from 2024-2025 indicate that agentic systems can improve clinical task performance by up to 60 percentage points over base language models, particularly in evidence retrieval and task planning (Abou Ali, Dornaika, and Charafeddine 2026).

The clinical question is not whether agents can draft text, but whether they can operate within a regulated environment: preserving data integrity, producing audit-ready traces, and remaining under accountable human oversight—a framework solidified by the 2024 EMA reflection paper on AI and risk-based approaches for leveraging AI in drug development (Unlearn.AI 2025; European Medicines Agency 2024; National Institute of Standards and Technology 2023; Amershi et al. 2019). This technological shift converges with the vision of Digital Twin “Moonshots”, which aim to integrate personalized digital twins directly into medical records to optimize both clinical care and trial participation (Duke Center for Virtual Imaging Trials 2024).

The underlying market dynamics reflect a real shift in how trials are run. The clinical trial software market has grown substantially—estimated at over USD 11 billion in 2024—and continues to expand at a compound annual growth rate exceeding 10% (Grand View Research 2025). AI applications in clinical trials are growing rapidly (Fortune Business Insights 2024). These trends should not be taken as evidence of clinical benefit on their own, but they help explain why sponsors, CROs, and platform vendors are reorganizing workflows around automation and AI-mediated operations.

23.1 The Modern Clinical Trial Ecosystem

The clinical trial technology stack has evolved from disconnected tools into an integrated ecosystem that powers every stage of research, from site operations to regulatory submissions. A decade ago, sponsors managed clinical data through a patchwork of vendor systems that rarely communicated: EDC databases that could not talk to randomization systems, trial master files stored in SharePoint folders with manual indexing, and clinical trial management systems that required spreadsheet reconciliation to produce accurate enrollment counts. Data flowed through exports, imports, and emails—a process that introduced latency, transcription errors, and audit risk at every handoff.

Today, the leading platforms aspire to unified architectures where patient data, operational metrics, essential documents, and supply chain signals flow through shared data models. In the best-integrated environments—particularly single-vendor platforms like Veeva Vault—when a site randomizes a patient, that event can propagate automatically: enrollment counts update in CTMS dashboards, treatment-specific CRF pages unlock in the EDC, drug shipment requests trigger in the supply management system, and expected document checklists populate in the eTMF. In practice, many sponsors still operate heterogeneous stacks with systems from multiple vendors, connected through middleware and custom integrations that require configuration, maintenance, and periodic reconciliation. The degree of integration varies widely across organizations, but the direction of travel is toward reduced manual handoffs and the shared data infrastructure on which analytics and automation depend.

The Technology Ecosystem

The following diagram shows how the major systems interconnect.

flowchart LR
    subgraph Sites["Clinical Sites"]
        EHR[Electronic Health Records]
        Wearables[Wearables & Sensors]
        Patient[Patient Portal]
    end
    
    subgraph Core["Core Data Platforms"]
        EDC[EDC<br/>Electronic Data Capture]
        CTMS[CTMS<br/>Clinical Trial Management]
        eTMF[eTMF<br/>Trial Master File]
        RTSM[RTSM<br/>Randomization & Supply]
    end
    
    subgraph AI["AI Layer"]
        Design[Protocol Design AI]
        Recruit[Patient Matching AI]
        QC[Data Quality AI]
        Predict[Predictive Analytics]
    end
    
    subgraph Outputs["Regulatory & Insights"]
        Submit[Regulatory Submissions]
        Reports[Real-time Dashboards]
        Risk[Risk Signals]
    end
    
    EHR -->|Patient Data| EDC
    Wearables -->|Biometrics| EDC
    Patient -->|ePRO/eCOA| EDC
    
    EDC <--> CTMS
    CTMS <--> eTMF
    CTMS <--> RTSM
    
    EDC --> QC
    CTMS --> Predict
    eTMF --> QC
    
    Design --> EDC
    Recruit --> CTMS
    
    QC --> Reports
    Predict --> Risk
    eTMF --> Submit
    EDC --> Submit

Figure 23.1: The Modern Clinical Trial Technology Stack

The clinical trial technology stack consists of four interconnected core platforms, each serving a distinct but complementary function (Medidata Solutions 2024a).

Electronic Data Capture (EDC) is the primary tool for clinical data collection. EDC systems replace paper case report forms with validated electronic forms that capture patient data at the point of care. When a coordinator records a blood pressure reading, administers a questionnaire, or documents an adverse event, that data flows into the EDC. Modern EDC platforms include built-in edit checks that flag impossible values (a heart rate of 500?) or logical inconsistencies (an adverse event dated before the patient enrolled) in real-time, catching errors before they propagate. The EDC database ultimately becomes the foundation for regulatory submissions—every efficacy and safety analysis traces back to data captured here (U.S. Food and Drug Administration 2023a).

Clinical Trial Management System (CTMS) is the operational command center. While EDC captures patient data, CTMS tracks trial operations: which sites are open, how many patients each has enrolled, when the next monitoring visit is scheduled, and what the budget burn rate looks like. CTMS provides the project management backbone that keeps a 50-site, 14-country trial from descending into chaos. It tracks milestones, manages contracts and payments, and generates the operational metrics that sponsors use to assess trial health (Grand View Research 2025).

Electronic Trial Master File (eTMF) is the regulatory archive. Every clinical trial generates thousands of documents: the protocol and its amendments, informed consent forms, IRB approvals, investigator CVs, monitoring reports, safety letters, and correspondence. Regulators require sponsors to maintain a complete Trial Master File as evidence that the trial was conducted properly. eTMF systems organize these documents according to the DIA Reference Model, track document completeness, and ensure inspection readiness (TMF Reference Model Initiative 2024). When an FDA inspector arrives, the eTMF is the first artifact they examine.

Randomization and Trial Supply Management (RTSM), sometimes called Interactive Response Technology (IRT), handles the logistics of treatment assignment and drug supply. When a patient is eligible for randomization, the RTSM system assigns them to a treatment arm according to the randomization scheme—maintaining the blind while ensuring balanced allocation. Simultaneously, RTSM tracks investigational product inventory at each site, triggers resupply shipments, and manages the complex logistics of getting the right drug to the right patient at the right time. For trials with temperature-sensitive biologics or personalized therapies, RTSM is mission-critical (Clinical Leader 2024).

These four systems do not operate in isolation—they exchange data continuously. When a patient is randomized in RTSM, that information flows to CTMS (updating enrollment counts) and EDC (enabling treatment-specific data collection). When a monitoring visit is completed, the report is filed in eTMF while the visit status updates in CTMS. This integration explains why unified platforms like Veeva Vault, which house all four systems in a single architecture, have gained such traction in the market (Veeva Systems 2024b).

23.2 Agentic AI in Clinical Workflows

As AI moves beyond static models, agentic systems are emerging that can navigate complex regulated workflows by coordinating multiple tools and reasoning steps.

Building Blocks and System Architecture

The recent resurgence of “agents” is best understood as a systems shift. Rather than calling a model once, agentic systems orchestrate multiple calls, state, and tools to complete a task: they decompose goals, retrieve context, take actions, and iterate based on feedback (Cheng et al. 2024; Wang et al. 2024). The Berkeley AI Research perspective on “compound AI systems” provides a useful framing: reliability gains often come from engineered compositions—retrievers, checkers, constrained tool calls, and repeated sampling—rather than from a single monolithic model invocation (Berkeley Artificial Intelligence Research 2024).

Three technical ideas recur across modern agent systems. First, structured intermediate reasoning (e.g., chain-of-thought prompting) can improve performance on complex tasks, though it does not guarantee correctness in the presence of missing or stale information (Wei et al. 2022). Second, retrieval-augmented generation (RAG) externalizes “memory” into a maintained knowledge base and can provide provenance when paired with citations to retrieved sources (Lewis et al. 2020; Douze et al. 2024). Third, coupling reasoning with tool use—where a model decides when to query, calculate, or fetch—yields more controllable trajectories than purely generative text, as formalized in approaches that interleave reasoning and acting (Yao et al. 2023).

In clinical operations, these building blocks map to concrete needs: pulling the right protocol amendment from an eTMF, verifying whether a site has an updated IRB approval, or checking that a safety narrative is consistent with the underlying case data. Each is fundamentally a retrieval + verification problem under audit constraints—not a “creative writing” task.

Operationalizing agents also requires consistent interfaces to external systems. Tool calling standards such as the Model Context Protocol (MCP) aim to standardize how AI applications connect to data sources and actions, turning “integration” into a first-class design surface (Anthropic 2025). In regulated contexts, the value is not novelty; it is the ability to enforce access controls, log every tool invocation, and make agent actions reviewable.

Clinical workflows are naturally multi-role (data management, monitoring, safety, regulatory, sites). Multi-agent frameworks formalize this by allocating roles and coordinating conversations among specialized agents (Wu et al. 2023). However, empirical analyses show that multi-agent systems often fail in predictable ways: specification gaps, coordination failures, and weak verification loops (Cemri et al. 2025). From an engineering perspective, this implies that “agent quality” must be evaluated not only by task accuracy, but also by cost, robustness, and reproducibility (Kapoor et al. 2024). Human-in-the-loop debugging and steering tools are emerging for these systems, reflecting the practical need to inspect and edit multi-step traces rather than treating outputs as black boxes (Epperson et al. 2025).

As soon as agents run concurrently across multiple workflows, operational concerns become first-order: context windows become a managed resource, tools must be scheduled and access-controlled, and traces must be stored in an auditable way. Emerging “agent runtime” proposals make these concerns explicit by separating agent applications from shared services such as scheduling, context management, and access control (Mei et al. 2024). Similarly, pipeline frameworks that treat agent workflows as composable graphs support systematic optimization and regression testing across versions—important for any environment where changes must be validated rather than “shipped and hoped” (Khattab et al. 2023).

Evaluation is also moving beyond single-task accuracy. Benchmarks emphasize realistic multi-step work with tool use and resource constraints, and propose clearer reporting of cost and reproducibility (Chan et al. 2024; Cappello et al. 2025).

Human Oversight and Limitations

Finally, agentic systems can create an illusion of competence: fluent outputs can be mistaken for validated decisions. This is a known socio-technical risk in scientific work, where productivity gains may coexist with a decline in genuine understanding and critical scrutiny (Messeri and Crockett 2024). Human–AI interaction guidelines emphasize making uncertainty visible, supporting oversight and correction, and ensuring users understand system limitations—principles that map directly to quality management and inspection readiness (Amershi et al. 2019).

23.3 Regulatory Framework for AI in Clinical Trials

For any sponsor or CRO considering AI applications in clinical trials, understanding the regulatory landscape is essential. This section provides a practical framework for determining what can and cannot be done with AI, what documentation and oversight are required, and when to engage regulators.

Enforceable Standards vs. Recommendations

A critical distinction exists between enforceable legal requirements and regulatory guidance. Failure to comply with enforceable requirements can result in warning letters, clinical holds, or rejection of submissions. Guidance documents represent FDA or EMA “current thinking” and are recommendations, not mandates—though departing from them without justification invites scrutiny.

Table 23.1: Regulatory Hierarchy for AI in Clinical Trials

Category	Examples	Consequence of Non-Compliance
Enforceable Law/Regulation	21 CFR Part 11 (electronic records), 21 CFR Part 312 (INDs), ICH E6 GCP, EU Clinical Trials Regulation	Warning letters, clinical holds, application rejection, criminal liability
Enforceable GxP Standards	ICH E6(R3) computerized systems requirements, Annex 11 (EU), data integrity requirements	Inspection findings, Form 483 observations, regulatory action
Regulatory Guidance	FDA draft guidance on AI for regulatory decisions (Jan 2025), EMA reflection paper on AI (Sept 2024)	Increased scrutiny, requests for additional information, delays

The 10 Guiding Principles for Good AI Practice

In January 2026, FDA (CDER and CBER) and EMA jointly published Guiding Principles of Good AI Practice in Drug Development—the first joint regulatory statement establishing foundational expectations for AI across the drug product lifecycle (U.S. Food and Drug Administration and European Medicines Agency 2026). While not legally binding, these principles represent international consensus on what “good practice” means for AI in drug development.

Joint FDA-EMA Guiding Principles (January 2026)

The following is quoted verbatim from the joint FDA-EMA document:

Human-centric by design The development and use of AI technologies align with ethical and human-centric values.
Risk-based approach The development and use of AI technologies follow a risk-based approach with proportionate validation, risk mitigation, and oversight based on the context of use and determined model risk.
Adherence to standards AI technologies adhere to relevant legal, ethical, technical, scientific, cybersecurity, and regulatory standards, including Good Practices (GxP).
Clear context of use AI technologies have a well-defined context of use (role and scope for why it is being used).
Multidisciplinary expertise Multidisciplinary expertise covering both the AI technology and its context of use are integrated throughout the technology’s life cycle.
Data governance and documentation Data source provenance, processing steps, and analytical decisions are documented in a detailed, traceable, and verifiable manner, in line with GxP requirements. Appropriate governance, including privacy and protection for sensitive data, is maintained throughout the technology’s life cycle.
Model design and development practices The development of AI technologies follows best practices in model and system design and software engineering and leverages data that is fit-for-use, considering interpretability, explainability, and predictive performance. Good model and system development promotes transparency, reliability, generalizability, and robustness for AI technologies contributing to patient safety.
Risk-based performance assessment Risk-based performance assessments evaluate the complete system including human-AI interactions, using fit-for-use data and metrics appropriate for the intended context of use, supported by validation of predictive performance through appropriately designed testing and evaluation methods.
Life cycle management Risk-based quality management systems are implemented throughout the AI technologies’ life cycles, including to support capturing, assessing, and addressing issues. The AI technologies undergo scheduled monitoring and periodic re-evaluation to ensure adequate performance (e.g., to address data drift). >
Clear, essential information Plain language is used to present clear, accessible, and contextually relevant information to the intended audience, including users and patients, regarding the AI technology’s context of use, performance, limitations, underlying data, updates, and interpretability or explainability.

These principles operationalize the core regulatory expectation: AI in drug development must be validated, documented, and maintained with the same rigor as any other regulated activity—but with additional attention to the unique characteristics of AI systems, including their data-dependency, potential opacity, and tendency to drift over time.

The FDA Risk-Based Credibility Framework

FDA’s January 2025 draft guidance “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products” establishes a seven-step risk-based framework for AI applications in drug development. While the guidance is not legally binding, it represents FDA’s expected approach for evaluating AI-generated evidence.

Scope of FDA AI Guidance

The FDA guidance applies when AI is used to produce information or data intended to support regulatory decision-making regarding safety, effectiveness, or quality. It does not apply to:

AI used in drug discovery (before IND)
AI used purely for operational efficiencies (e.g., internal workflows, resource allocation) that do not impact patient safety, drug quality, or study reliability

The Seven-Step Credibility Assessment Process:

Define the Question of Interest: What specific question, decision, or concern is the AI model addressing?
Define the Context of Use (COU): What is the specific role and scope of the AI model? Will other evidence be used alongside it?
Assess Model Risk: Combine model influence (contribution of AI evidence relative to other evidence) with decision consequence (significance of adverse outcomes from incorrect decisions)—see Figure 23.2 for how common AI applications map onto this two-dimensional risk space
Develop a Credibility Assessment Plan: Document model architecture, training data, evaluation methods, and performance metrics—with rigor proportional to model risk
Execute the Plan: Implement the credibility assessment activities
Document Results: Prepare a credibility assessment report with any deviations from the plan
Determine Adequacy: Evaluate whether model credibility is sufficient for the COU

quadrantChart
    title AI Model Risk Assessment
    x-axis Low Model Influence --> High Model Influence
    y-axis Low Decision Consequence --> High Decision Consequence
    quadrant-1 High Risk
    quadrant-2 Medium Risk
    quadrant-3 Low Risk
    quadrant-4 Medium Risk
    "Patient stratification (sole determinant)": [0.85, 0.90]
    "Eligibility screening support": [0.40, 0.50]
    "Document classification": [0.30, 0.20]
    "Data quality triage": [0.45, 0.35]
    "Dosing recommendations": [0.75, 0.85]
    "Protocol complexity scoring": [0.35, 0.30]

Figure 23.2: FDA AI Model Risk Matrix: Risk Increases with Model Influence and Decision Consequence

What You CAN and CANNOT Do with AI in Clinical Trials

The following table synthesizes enforceable requirements and regulatory expectations from FDA guidance, EMA reflection papers, and ICH E6(R3). Applications are categorized by risk level, with specific requirements and limitations for each.

Table 23.2: AI Applications in Clinical Trials: Regulatory Status and Requirements

AI Application	Regulatory Status	Requirements	Limitations
Document classification and TMF filing	Permitted with oversight	Validate classification accuracy; maintain human review for inspection-critical documents	Cannot replace human accountability for TMF completeness
Data quality screening and query generation	Permitted with oversight	Document AI logic; human review of generated queries before sending to sites	Cannot auto-close queries without human verification
Patient matching/eligibility pre-screening	Permitted as decision support	Validate against eligibility criteria; investigator makes final determination	Cannot make final eligibility decisions—investigator responsibility under GCP
Protocol complexity scoring	Permitted	Document methodology; validate predictions against historical data	Operational tool only—no regulatory submission required
Site selection and feasibility	Permitted	Document data sources and model logic	Operational tool; sponsor retains responsibility for site qualification
Adverse event case processing	Permitted with lifecycle monitoring	Monitor model performance; human review of serious/unexpected cases	Cannot replace pharmacovigilance qualified person oversight
Statistical analysis endpoints	Permitted with pre-specification	Pre-specify in SAP; freeze model before database lock; prospective validation required for high-impact uses	Cannot modify model after unblinding; post hoc AI analysis is exploratory only
Primary endpoint assessment	Conditional—high regulatory scrutiny	EMA: prospective testing with newly acquired data required; FDA: credibility assessment proportional to risk	Model must be frozen and fully documented in SAP
Dosing/treatment assignment decisions	Conditional—requires early FDA/EMA engagement	Full credibility assessment; human-in-the-loop required; extensive safety monitoring	Cannot be sole determinant without validated safety controls
Replacing clinical judgment	Not permitted	—	AI supports decisions; investigators and physicians retain accountability under GCP

Critical Principle: Human Accountability Cannot Be Delegated to AI

Under ICH E6 GCP and FDA regulations, the investigator is responsible for medical decisions affecting trial participants, and the sponsor is responsible for trial conduct and data integrity. AI can support these decisions but cannot replace the accountable human. An investigator cannot defend a protocol deviation by stating “the AI said it was acceptable.”

Validation and Documentation Requirements

AI systems used in clinical trials must meet the same computerized systems validation requirements as any other regulated software. The applicable standards depend on the regulatory jurisdiction and the system’s role.

21 CFR Part 11 (FDA) and Annex 11 (EU) Requirements:

Both frameworks require that computerized systems generating, modifying, or storing electronic records for regulatory submissions meet validation and control standards:

System validation: Documented evidence that software performs as intended, including AI model verification
Audit trails: Computer-generated, time-stamped trails recording all system actions that create, modify, or delete electronic records
Access controls: Unique user identification, authentication, and role-based permissions
Data integrity: Controls ensuring data is attributable, legible, contemporaneous, original, and accurate (ALCOA+)
Operational controls: Documented procedures for system use, maintenance, and change control

ICH E6(R3) Computerized Systems Requirements (January 2025):

The updated GCP guideline adds specific requirements relevant to AI:

Fitness-for-purpose: Systems must be validated to be fit for the specific use in the trial
Data governance: Dedicated section requiring documented data and records management
Risk-based validation: Proportional approach based on impact on patient safety and data reliability
Metadata and automated sources: Recognition of data from wearables, sensors, and automated systems as primary source data

Table 23.3: Computerized Systems Validation Requirements for AI

Requirement	21 CFR Part 11	Annex 11	ICH E6(R3)
System validation	Required	Required	Required (risk-based)
Audit trails	Required	Required	Required
Access controls	Required	Required	Required
Change control	Required	Required	Required
Data backup/recovery	Required	Required	Required
Training documentation	Required	Required	Required
Supplier qualification	—	Required	Required

AI Model Lifecycle Maintenance

Unlike static software, AI models may degrade or drift over time as input data distributions change. Both FDA and EMA guidance emphasize lifecycle maintenance for AI models deployed over extended periods.

When Lifecycle Maintenance is Required:

AI models used in manufacturing (e.g., quality control, process optimization)
AI models used in pharmacovigilance (e.g., case classification, signal detection)
Any AI system where model performance may change with new data inputs

Lifecycle Maintenance Activities:

Performance monitoring: Define metrics and thresholds; monitor on risk-based frequency
Drift detection: Identify when input data diverges from training data distribution
Revalidation triggers: Pre-define conditions requiring model retesting or retraining
Change management: Evaluate all model changes through pharmaceutical quality system
Regulatory notification: Report changes impacting model performance per applicable requirements (e.g., 21 CFR 314.70)

EMA on Incremental Learning

For pharmacovigilance applications, EMA permits a “more flexible approach” where incremental learning can continuously enhance models. However, the MAH retains responsibility to validate, monitor, and document model performance as part of the pharmacovigilance system.

When to Engage Regulators

Early engagement with FDA or EMA is strongly recommended for AI applications with high regulatory impact or high patient risk. The table below summarizes engagement pathways.

Table 23.4: Regulatory Engagement Pathways for AI in Clinical Trials

AI Use Case	Recommended Engagement	FDA Contact	EMA Contact
Novel clinical trial design using AI	CDER C3TI or CID Meeting Program	CDERclinicaltrialinnovation@fda.hhs.gov	Innovation Task Force (ITF)
AI for endpoint evaluation	Drug Development Tools (DDT) or ISTAND	CDERBiomarkerQualificationProgram@fda.hhs.gov	SAWP qualification
AI-enabled digital health technology	DHT Program	DHTsforDrugDevelopment@hhs.fda.gov	—
AI in pharmacovigilance	Emerging Drug Safety Technology Program	AIMLforDrugDevelopment@fda.hhs.gov	PRAC interaction
AI in manufacturing	Emerging Technology Program (CDER ETP)	CDERETT@fda.hhs.gov	—
Model-informed drug development	MIDD Paired Meeting Program	MIDD@fda.hhs.gov	SAWP scientific advice

EMA-Specific Considerations

The EMA reflection paper (September 2024) introduces terminology and expectations that differ somewhat from FDA:

High patient risk: AI uses affecting patient safety (e.g., dosing, treatment assignment)
High regulatory impact: AI uses substantially affecting regulatory decisions (e.g., primary endpoint analysis)
Risk-based approach: Rigor of credibility assessment should be proportional to risk level

Key EMA Positions:

Transparent models preferred: “The use of transparent models is preferred” to strengthen accountability. Black box models may be acceptable if transparent models show unsatisfactory performance, with additional documentation and monitoring requirements.
Frozen models for pivotal trials: “Prior to the database lock and subsequent unblinding…the data pre-processing pipeline and all models should be frozen and documented in a traceable manner in the statistical analysis plan.”
Prospective validation for high-impact uses: “For inference in late-stage clinical development…performance should be tested with prospectively generated data (future calendar time) that is acquired in a setting or population representative of the intended context of use.”
No incremental learning in pivotal trials: “Incremental learning approaches are not accepted, and any modification of the model during the trial requires a regulatory interaction.”
Human-in-the-loop for precision medicine: AI-driven indication or posology recommendations are “high patient risk as well as high regulatory impact” and require “fall-back treatment strategies in cases of technical failure.”

Practical Implications: A Decision Framework

For sponsors evaluating whether and how to deploy AI in a clinical trial, Figure 23.3 provides a practical decision framework:

flowchart TD
    A[Proposed AI Application] --> B{Does AI produce data/information<br/>for regulatory decisions?}
    B -->|No - Operational only| C[Lower regulatory burden<br/>Document for inspection readiness]
    B -->|Yes| D{What is the decision consequence<br/>if AI output is incorrect?}
    D -->|Low| E[Low-risk application<br/>Standard validation<br/>Document methodology]
    D -->|Medium/High| F{What is the AI model influence?}
    F -->|Low - other evidence<br/>also used| G[Medium-risk application<br/>Proportional credibility assessment<br/>Human oversight of outputs]
    F -->|High - AI is primary<br/>or sole evidence| H[High-risk application<br/>Full credibility assessment<br/>Early regulatory engagement<br/>Prospective validation<br/>Pre-specified in SAP]
    
    C --> I[Proceed with GxP-compliant<br/>documentation and oversight]
    E --> I
    G --> J[Develop credibility assessment plan<br/>Consider regulatory feedback]
    H --> K[Engage FDA/EMA before deployment<br/>Full 7-step credibility process]

Figure 23.3: Decision Framework for AI Deployment in Clinical Trials

Summary: The Regulatory Bottom Line

The regulatory framework for AI in clinical trials can be summarized in five principles:

AI supports, it does not replace: Human accountability for medical decisions and regulatory compliance cannot be delegated to AI systems. Investigators, sponsors, and MAHs retain responsibility.
Risk determines rigor: The rigor of validation, documentation, and oversight should be proportional to model risk—a combination of decision consequence and model influence.
Pre-specification is essential for confirmatory evidence: AI models used to generate evidence for regulatory submissions must be frozen and documented in the statistical analysis plan before database lock. Post hoc AI analysis is exploratory only.
Lifecycle maintenance is required for deployed models: AI systems operating over time (manufacturing, pharmacovigilance) require ongoing performance monitoring, drift detection, and change management.
Early engagement de-risks novel applications: For high-risk AI applications, early consultation with FDA or EMA can align expectations and prevent costly late-stage objections.

23.4 The Foundational “Backbone”: Platform Wars

For decades, clinical data lived in silos—spreadsheets here, PDFs there, fax machines everywhere. Today, unified platforms serve as the operating system for clinical research. The clinical trial platform market is dominated by a handful of enterprise players, with intense competition driving innovation (Medidata Solutions 2024a):

Table 23.5: Comparison of Major Clinical Trial Platforms

Vendor	Primary Strengths	Market Position	Cloud Model	AI Capabilities
Medidata (Dassault)	Industry-standard EDC (Rave), 25-year track record, 36,000+ studies	Market leader in EDC	Cloud/SaaS	AI-powered signal detection, synthetic control arms
Veeva Systems	Unified Vault platform (eTMF, CTMS, EDC), life sciences focus	Fast-growing challenger	Cloud-native	TMF Intake Agent, Quality Check Agent
Oracle	Enterprise scale, Siebel Clinical One, regulatory expertise	Established incumbent	Cloud/On-prem	ML-based safety analytics
IQVIA	Real-world data integration, global CRO services	CRO-integrated platform	Cloud/SaaS	Intelligent eTMF, predictive enrollment

Deployment Trends

Over 57% of new clinical trial system deployments are now cloud-based, up from 30% five years ago (International Data Corporation 2024). The pandemic accelerated this shift.

Major Platform Vendors

Medidata (acquired by Dassault Systemes in 2019) remains the dominant EDC platform, with its Rave EDC system recognized as the industry standard. The 2025 ISR Benchmarking Report ranked Rave EDC as the top-preferred EDC system based on independent sponsor evaluations. Medidata’s scale is substantial: over 700,000 certified site users, 1.8 million EDC users, and more than 36,000 studies managed across all phases and therapeutic areas (Medidata Solutions 2024b).

Medidata’s AI capabilities include Acorn AI, which provides synthetic control arms using historical patient data to reduce or eliminate placebo groups in certain trial designs. Their Sensor Cloud integrates wearable device data directly into the EDC, enabling continuous physiological monitoring without manual data entry.

Veeva has rapidly gained market share by offering a unified Vault platform that integrates eTMF, CTMS, and EDC in a single system—a contrast to Medidata’s historically modular approach. Veeva’s exclusive focus on life sciences (unlike Oracle or Salesforce, which serve multiple industries) has allowed deep specialization. Beyond its core platform, Veeva is deploying specialized AI Agents that automate the most tedious parts of clinical operations. Figure 23.4 illustrates the document processing workflow:

sequenceDiagram
    participant Site as Site Upload
    participant Intake as TMF Intake Agent
    participant QC as Quality Check Agent
    participant TMF as eTMF Vault
    participant User as Document Manager
    
    Site->>Intake: Upload document (PDF/scan)
    Intake->>Intake: Extract metadata<br/>(investigator, date, type)
    Intake->>Intake: Classify to DIA artifact
    Intake->>QC: Route for quality check
    QC->>QC: Check for signatures
    QC->>QC: Validate completeness
    alt Document Complete
        QC->>TMF: File to correct binder
        TMF->>User: Notification: "Document filed"
    else Issues Found
        QC->>User: Alert: "Missing signature"
        User->>Site: Request correction
    end

Figure 23.4: How Veeva’s AI Agents Process TMF Documents

Veeva’s AI capabilities center on two key agents. The TMF Intake Agent automatically classifies documents uploaded by sites, extracting metadata such as investigator name and document date to route files to the correct TMF binder. The Quality Check Agent reviews documents for errors—missing signatures, wrong versions, incomplete forms—before a human ever sees them, reducing TMF backlog by up to 80% according to Veeva’s published benchmarks (Veeva Systems 2024a).

The eTMF market alone is worth $1.4 billion and growing at 12.8% annually (MarketsandMarkets 2024). Three vendors are competing for dominance:

Table 23.6: eTMF Platform Feature Comparison

Feature	Veeva eTMF	IQVIA eTMF	Phlexglobal eTMF
Auto-Classification	AI-powered DIA mapping	ML-based indexing	Intelligent auto-filing
Completeness Prediction	Expected document lists	Milestone-based gaps	Risk-based prioritization
Inspection Readiness	Real-time dashboards	Inspection-ready reports	Audit trail analytics
Site Integration	SiteVault connected	Site-facing portal	Sponsor-site bridge
Unique Strength	Unified Vault ecosystem	RWD integration	eTMF-specialist focus

These platforms use AI in two ways. First, auto-indexing uses machine learning models to classify unorganized scans into the DIA Reference Model structure. Second, completeness prediction algorithms identify missing documents based on study milestones—for example, flagging that “Site 101 has initialized but is missing a financial disclosure form” (IQVIA 2024; Phlexglobal 2024).

Medable took a different path—building for the decentralized trial from day one. As hybrid and virtual trials became mainstream, Medable’s modular platform enables patients to participate from home (Medable 2024a):

Table 23.7: Medable DCT Capabilities (impacts are implementation- and protocol-dependent) (Medable 2024b)

Capability	What It Does	Impact
TeleVisit	Video conferencing for remote assessments	Reduced travel for suitable protocols (implementation-dependent)
eConsent	Multimedia-rich digital consent	Improved comprehension and workflow consistency (context-dependent)
Medable AI	Generates digital eCOA from paper protocols	Faster digitization and reuse of instruments (vendor-reported)
TMF Automation	Processes DCT-generated document flood	Helps manage higher document volume (protocol-dependent)

23.5 AI across the Development Lifecycle

AI capabilities are now being applied across every phase of clinical development—from protocol design through recruitment, operations, and data management. This section examines specific tools and workflows at each stage, focusing on where automation delivers measurable value and where human oversight remains essential.

AI in Protocol Design

Benchmarking studies suggest that most clinical trials fail to meet planned enrollment timelines, often because of design choices baked in before the first patient is screened (Lamberti et al. 2024a). AI tools now address this problem at the protocol design stage, as illustrated in Figure 23.5.

flowchart LR
    subgraph Input["Inputs"]
        RWD[Real-World Data<br/>large cohorts]
        Hist[Historical Trials<br/>Protocol library]
        Reg[Regulatory Requirements]
    end
    
    subgraph AI["AI Analysis"]
        Sim[Patient Simulation]
        Burden[Burden Scoring]
        Feasibility[Site Feasibility]
    end
    
    subgraph Output["Outputs"]
        Protocol[Optimized Protocol]
        SoA[Schedule of Activities]
        Sites[Recommended Sites]
    end
    
    RWD --> Sim
    Hist --> Burden
    Reg --> Protocol
    
    Sim --> Feasibility
    Burden --> Protocol
    Feasibility --> Sites
    
    Protocol --> SoA

Figure 23.5: AI-Driven Protocol Design Workflow

Table 23.8: Protocol Design AI Tools Comparison

Tool	Data Source	Primary Use Case	Key Metric
Faro Health	Protocol library + RWD	Operational burden prediction	Complexity score
Phesi	100M+ patient profiles	Enrollment simulation	Patient availability
Medidata AI	Historical trial data	Protocol optimization	Predicted enrollment rate
TrialSpark	Site network data	Site selection	Per-site enrollment probability

Faro Health exemplifies this new approach. Instead of writing a static Word document, study teams design the trial in a structured cloud platform. The AI analyzes protocol complexity against real-world data to predict operational burden by scoring the complexity of the schedule of assessments, visualize patient burden by identifying visits that require too many procedures, and generate documents by automating protocol document and Schedule of Activities creation (Faro Health 2024).

Phesi takes simulation even further, leveraging data from over 100 million patients to model trial outcomes before finalizing the protocol. Their Digital Patient Profile reduces the likelihood of the “zero-enrollment site” problem mentioned in Chapter 16.

AI in Recruitment and Site Selection

Finding the right patients remains the perennial bottleneck. Benchmarking studies suggest that recruitment and retention often take substantially longer and cost more than planned, and that each day of delay can be economically material for sponsors in some therapeutic areas (Lamberti et al. 2024b; Deloitte Centre for Health Solutions 2025). AI tools now scan electronic health records to find “needle in the haystack” candidates—and, when integrated into workflow, can reduce the manual effort required to identify potentially eligible participants. Figure 23.6 shows the typical pipeline:

flowchart LR
    subgraph Sources["Data Sources"]
        EHR[EHR Systems]
        Claims[Claims Data]
        Labs[Lab Results]
        Genomics[Genomic Profiles]
    end
    
    subgraph NLP["NLP Processing"]
        Extract[Entity Extraction]
        Normalize[Terminology Normalization]
        Temporal[Temporal Reasoning]
    end
    
    subgraph Match["Matching Engine"]
        Criteria[I/E Criteria Parser]
        Score[Eligibility Scoring]
        Rank[Patient Ranking]
    end
    
    subgraph Output["Results"]
        Patients[Matched Patients]
        Sites[Optimized Sites]
        Alerts[Provider Alerts]
    end
    
    EHR --> Extract
    Claims --> Extract
    Labs --> Normalize
    Genomics --> Normalize
    
    Extract --> Criteria
    Normalize --> Criteria
    Temporal --> Score
    
    Criteria --> Score
    Score --> Rank
    
    Rank --> Patients
    Patients --> Sites
    Sites --> Alerts

Figure 23.6: AI-Powered Patient Matching Pipeline

Table 23.9: Patient Recruitment AI Vendors

Vendor	Technology	Data Assets	Best For
NextTrial.ai	NLP + ML matching	EHR integration	Complex I/E criteria
H1	KOL mapping, investigator analytics	Publications, claims, trials	Site selection, investigator finding
Deep 6 AI	Real-time EHR search	Health system partnerships	Oncology, rare disease
TriNetX	Federated network	400M+ patient records	Global feasibility
Komodo Health	Healthcare map	Claims + RWD	Patient path analysis

NextTrial.ai uses natural language processing to bridge the gap between protocol criteria and patient records. The platform ingests unstructured data—reading clinician notes, pathology reports, and genetic profiles—then matches patients by automatically flagging those who meet complex inclusion/exclusion criteria, and optimizes site selection by predicting which investigator sites have the highest density of eligible patients (NextTrial 2024).

H1 takes a different approach, mapping the global network of Key Opinion Leaders and investigators. By analyzing billions of data points—publications, claims data, and clinical trial records—H1 helps sponsors find investigators who are not just academically prominent but actively treating the target patient population (H1 2024).

AI in Clinical Operations and Data Quality

Once data starts flowing, it must be cleaned. As data volume increases (wearables generate 1,000+ data points per patient per day), manual query resolution becomes unsustainable. In practice, the emerging approach layers multiple AI techniques across the data pipeline, each suited to different types of quality problems.

Rule-based checks remain the foundation: programmed validations that flag impossible values (negative ages, dates in the future) or logical inconsistencies (randomization before consent). These deterministic checks are fast, auditable, and well-understood.

Machine learning anomaly detection adds statistical pattern recognition. ML models trained on historical trial data can identify unusual distributions, digit preferences suggestive of fabrication, or sites whose data patterns diverge from comparators. Unlike rule-based systems, ML can surface problems that were not anticipated during study design.

Large language models (LLMs) contribute to discrepancy analysis—parsing free-text fields in adverse event narratives, medical history, or concomitant medication entries to identify inconsistencies that would be invisible to structured checks. An LLM might flag that a narrative describes “chest pain radiating to left arm” while the coded adverse event is “headache.”

Cross-domain validation links data across sources: comparing EDC entries against central lab results, device uploads against visit schedules, and ePRO responses against clinical observations. Discrepancies across domains often indicate transcription errors or protocol deviations.

The output is a set of automated actions: auto-generated queries with draft text for site response, risk flags that prioritize monitoring attention, trend alerts that surface site-level quality patterns, and quality reports for oversight review (Figure 23.7). The goal is not to replace data management staff but to reduce the time spent on low-value triage and increase the proportion of effort spent on judgment-intensive resolution.

flowchart LR
    subgraph Ingest["Data Ingestion"]
        EDC[EDC Data]
        Devices[Device Data]
        Labs[Central Labs]
        ePRO[ePRO Responses]
    end
    
    subgraph AI["AI Processing"]
        Rules[Rule-Based Checks]
        ML[ML Anomaly Detection]
        LLM[LLM Discrepancy Analysis]
        Cross[Cross-Domain Validation]
    end
    
    subgraph Actions["Automated Actions"]
        AutoQuery[Auto-Generated Queries]
        Flag[Risk Flags]
        Trend[Trend Alerts]
        Report[Quality Reports]
    end
    
    EDC --> Rules
    Devices --> ML
    Labs --> Cross
    ePRO --> LLM
    
    Rules --> AutoQuery
    ML --> Flag
    LLM --> AutoQuery
    Cross --> Trend
    
    Flag --> Report
    Trend --> Report

Figure 23.7: AI-Powered Data Quality Management

Table 23.10: Data Quality AI Tools

Tool	Primary Approach	Best Feature	Integration Depth
Saama	Clinical Command Center	Unified EDC/CTMS/eTMF view	Deep multi-system
Octozi	LLM-based discrepancy detection	Natural language queries	EDC-focused
Veeva CDB	Vault-native data management	Single-platform simplicity	Veeva ecosystem
Medidata Detect	Signal detection algorithms	Safety signal identification	Rave ecosystem

Saama provides an AI-driven Clinical Command Center. Their platform unifies data from EDC, CTMS, and eTMF to provide an integrated view of trial health. Saama’s AI models, trained on more than 300 million data points, predict site non-compliance and enrollment delays, enabling intervention before problems become critical (Saama Technologies 2024).

Octozi applies Large Language Models to automate data review. The platform performs automated discrepancy detection, scanning for inconsistencies such as “male patient listed as pregnant” or “adverse event date before informed consent.” By identifying these issues instantly, Octozi eliminates the need for line-by-line manual review that traditionally consumed data management resources (Octozi 2024).

23.6 Emerging Methodologies and Operational Solutions

Innovation in clinical trials is not limited to software; it includes new methodological frameworks that leverage real-world data, federated learning, and digital simulations to rethink the evidence-generation process.

Solving Logistical Challenges with IT

Modern trials face significant logistical friction (see Section 18.6.1). The challenges are structural: supply chains that must deliver temperature-sensitive products to thousands of endpoints, sites overwhelmed by redundant data entry across disconnected systems, protocols too complex for manual compliance tracking, and global operations fragmented across time zones and organizations. These are coordination problems, and coordination problems respond to better information infrastructure.

Supply chain complexity has intensified with decentralized trials and direct-to-patient distribution. Traditional approaches relied on spreadsheets and email to track inventory and shipments; delays were discovered after the fact. Modern systems integrate IoT sensors that monitor temperature, location, and chain of custody in real time. AI-based demand forecasting uses enrollment trajectories and visit schedules to predict resupply needs before stockouts occur. The result is reduced waste (fewer expired products) and fewer patient visits disrupted by supply failures.

Site burden accumulates when coordinators must enter the same data into multiple systems—pulling information from the electronic health record, transcribing it to the EDC, and reconciling discrepancies later. EHR-to-EDC integration automates the flow of structured data (lab values, vital signs, demographics) from the source system to the trial database, reducing transcription errors and freeing coordinator time for patient-facing work. This integration requires careful validation and mapping, but when implemented well, it addresses one of the most persistent complaints from clinical sites.

Protocol deviations often result from complexity: too many visits, too many procedures, too many eligibility criteria for staff to track manually. Real-time nudge systems monitor visit windows and upcoming assessments, alerting coordinators before a deviation occurs rather than flagging it retrospectively. Detection systems analyze patterns across sites to identify systematic compliance failures that may indicate training gaps or protocol design problems.

Global operations compound all of these challenges. When a trial spans 25 countries, 80 sites, and multiple CRO partners, fragmented regional teams operating from different data sources create reconciliation overhead and conflicting reports. Unified command centers aggregate operational data into a single platform, providing consistent metrics, standardized escalation pathways, and a shared view of trial status across all stakeholders.

Table 23.11: IT Solutions for Logistical Challenges (reported impacts vary by implementation)

Challenge	Traditional Approach	Modern IT Solution	Reported Impact
Supply Chain	Spreadsheets, email	IoT sensors, AI forecasting	Reduced waste, fewer stockouts
Site Burden	Redundant data entry	EHR-to-EDC integration	Less transcription time
Protocol Deviations	Retrospective monitoring	Real-time nudge engines	Fewer major deviations
Global Operations	Fragmented regional teams	Unified command centers	Single source of truth

Novel Methodological Approaches

This section covers a set of approaches that challenge traditional assumptions about how evidence is generated and compared: using observational data to support decisions (RWE), using historical or synthetic comparators to reduce concurrent control enrollment, and using models to predict counterfactual outcomes. All remain subjects of active regulatory and scientific debate.

Clinical research data are fragmented across institutions, each with privacy regulations, competitive interests, and technical barriers to sharing. Federated learning addresses this by training models collaboratively without centralizing data: algorithms travel to data sources, train locally, and share only model updates (gradients or weights) rather than patient records. Owkin has applied federated learning to construct external control arms from real-world data distributed across hospital networks. Their 2025 publication in Nature Communications demonstrated federated external control arms for oncology trials, enabling international collaboration while maintaining GDPR and HIPAA compliance (Owkin 2024). The practical value is clearest in rare diseases and oncology, where recruiting concurrent controls may be infeasible or ethically questionable. However, the approach requires sophisticated infrastructure, standardized data formats across institutions, and careful attention to the comparability of patient populations.

The term digital twin originated in engineering, where virtual models of physical systems enable simulation and optimization. In clinical research, digital twins are computational models that predict an individual patient’s disease trajectory under control conditions, based on baseline characteristics and historical data from similar patients (Laubenbacher, Sluka, and Glazier 2021).

Unlearn.AI’s PROCOVA methodology exemplifies this approach. Their models—trained on historical patient data—generate prognostic scores that predict each patient’s likely outcome if assigned to the control arm. These scores are then used as covariates in the primary analysis, reducing residual variance and enabling smaller control groups while maintaining unbiased treatment effect estimates and Type I error control (Unlearn.AI 2024).

The European Medicines Agency issued a favorable qualification opinion for PROCOVA in September 2022—the first regulatory endorsement of a machine-learning method for sample size reduction in pivotal trials. In January 2024, FDA confirmed that PROCOVA does not deviate from current statistical guidance and is an acceptable methodology. In favorable settings, the approach can reduce control arm sizes while maintaining error control, allowing more participants to receive experimental treatment (Unlearn.AI 2024).

Real-World Evidence and Regulatory Considerations

The randomized controlled trial remains the gold standard for establishing causal treatment effects, but real-world evidence (RWE)—derived from observational data collected outside the controlled trial setting—is playing an expanding role in drug development and regulatory decision-making.

Real-world data (RWD) includes:

electronic health records
claims databases
disease registries
wearable devices
patient-generated data

When analyzed appropriately, RWD can generate real-world evidence about drug safety, effectiveness, and utilization patterns. The distinction matters: RWD is the raw data; RWE is the clinical evidence derived from it through rigorous analysis (U.S. Food and Drug Administration 2018).

FDA has articulated when RWE may support regulatory decisions. For safety, RWE has long been used for post-marketing surveillance—detecting rare adverse events that trials were not powered to identify. For effectiveness, FDA acceptance is more cautious but expanding. The 21st Century Cures Act directed FDA to evaluate RWE for approving new indications for existing drugs and for satisfying post-marketing study requirements. FDA’s 2023 final guidance on RWE for regulatory decisions emphasized that data relevance, reliability, and analytic rigor determine acceptability—not the mere existence of a large dataset (U.S. Food and Drug Administration 2023c).

The practical applications span a spectrum. Single-arm trials with external controls compare treated patients against matched historical or concurrent observational cohorts—most defensible in diseases with well-characterized natural history and no effective standard of care. Hybrid designs randomize a reduced control arm while borrowing information from external data to increase statistical precision. Post-marketing effectiveness studies use RWD to assess whether efficacy demonstrated in controlled trials translates to real-world populations with comorbidities and concomitant medications excluded from pivotal trials.

The core challenge is confounding: in observational data, treatment selection is not random. Patients who receive a therapy may differ systematically from those who do not, and these differences—rather than the treatment itself—may explain observed outcomes. Propensity score methods, instrumental variables, and target trial emulation frameworks attempt to address confounding, but none can fully substitute for randomization. Unmeasured confounders remain a fundamental limitation.

RWE is most credible when: the outcome is objective and reliably captured in routine care; the comparison is against natural history rather than an active comparator; the patient population in the RWD source is demonstrably similar to the trial population; and sensitivity analyses show robustness to plausible unmeasured confounding. Even then, regulators typically view RWE as supportive rather than dispositive for efficacy claims—strengthening a submission that includes randomized evidence rather than replacing it.

FDA’s 2023 guidance on externally controlled trials provides a framework for using real-world data to construct external control arms, addressing data quality, comparability, and bias mitigation strategies (U.S. Food and Drug Administration 2023b). The guidance emphasizes that external controls are most appropriate when concurrent randomization is infeasible or unethical, and when the disease has a well-characterized natural history with reliable outcome measurement in routine care.

Synthetic control arms (SCAs) construct external comparators from patient-level data in historical trials rather than recruiting new control patients. Medidata’s platform draws on a database spanning over 36,000 trials and 11 million patients to statistically match historical controls to current trial populations.

In October 2020, FDA approved a precedent-setting hybrid SCA for Medicenna Therapeutics’ Phase III trial in recurrent glioblastoma—the first acceptance of a hybrid external control in a registrational trial for an indication that previously required traditional 1:1 randomization. The approach reduced prospective control enrollment by approximately two-thirds (Medidata Solutions 2024c).

SCAs are most applicable in rare or life-threatening diseases with inadequate standard-of-care, where historical control data are robust and disease progression is well-characterized. FDA has been most receptive in early-phase development and single-arm trials, with hybrid models (combining historical and concurrent controls) gaining acceptance for later stages.

These technologies generate both enthusiasm and legitimate skepticism. Several challenges merit attention.

Verification, validation, and uncertainty quantification (VVUQ) remain incompletely standardized. A 2025 review in npj Digital Medicine emphasized that VVUQ frameworks are essential for safety and efficacy but vary widely across implementations, hampering regulatory evaluation and clinical adoption (Sel et al. 2025).

Model opacity is a recurring concern. Berry Consultants has argued that PROCOVA is essentially an extension of classical covariate adjustment—substituting a proprietary neural network for transparent regression models applied to the same baseline data. When model details are withheld, sponsors and regulators cannot interrogate, replicate, or improve the methodology. As statistician Scott Berry has noted: “I highly doubt in most scenarios that…this is actually better” than standard covariate adjustment with the same data.

Data quality and completeness constrain all approaches. In oncology, the data needed to model tumor dynamics are often noisy, incomplete, and subject to collection burden (Venkatesh, Raza, and Kvedar 2022). Statistical models introduce their own uncertainty, particularly when predictions must generalize across populations and disease subtypes.

Uncertainty quantification is essential but often underemphasized. A single predicted outcome per patient is inadequate; scientifically defensible use requires generating full distributions of potential outcomes—what Berry calls “digital googols” rather than digital twins—to capture the uncertainty inherent in counterfactual prediction.

Comparability assumptions underpin all external control approaches. If the historical population differs systematically from the current trial population—due to changes in standard of care, patient selection, or measurement practices—treatment effect estimates may be biased regardless of the sophistication of the matching algorithm.

FDA has signaled that digital twins and AI-assisted trial design will receive significant regulatory oversight when they directly affect trial conduct and interpretation. In a January 2025 JAMA publication, FDA Commissioner Robert Califf and senior officials described digital twins as an area of active concern, distinct from lower-risk AI applications like patient matching or data cleaning that receive lighter scrutiny (Warraich, Tazbaz, and Califf 2024).

The regulatory path forward will likely require sponsors to demonstrate not only statistical validity but also model transparency, reproducibility, and appropriate characterization of uncertainty. Technologies that rely on opaque, proprietary methods may face higher evidentiary bars than those built on interpretable, well-documented approaches.

Table 23.12: Emerging Clinical Trial Technologies: Status and Limitations

Approach	How It Works	Regulatory Status	Key Limitation
Federated Learning	Models train locally; only updates shared	EMA letter of support; FDA engagement	Infrastructure complexity
Digital Twins (PROCOVA)	Prognostic scores as covariates	EMA qualified (2022); FDA acceptable (2024)	Model opacity; uncertainty
Synthetic Control Arms	Historical data matched to trial	FDA approved hybrid Phase III (2020)	Comparability assumptions

23.7 Future Impact and Economics

The convergence of AI and digital platforms is shifting the fundamental cost drivers of clinical development, potentially reducing cycle times and sample sizes while introducing new operational complexities.

AI Impact on Trial Operations

The technologies described in this chapter have the potential to reshape the economics of clinical development. Recall the economics from Chapter 6: $2.3 billion to bring a drug to market, $40,000 per day to operate a Phase III trial, and $600,000 to $1.3 million per day in opportunity cost from delays (DiMasi, Grabowski, and Hansen 2016; Deloitte Centre for Health Solutions 2025). AI-enabled tools are being applied to each of these cost drivers, though the realized impact depends on implementation quality, governance, and the specific operational context.

Table 23.13: Potential AI Impact on Clinical Trial Cost Structure (industry projections; actual results vary)

Current Cost Structure	AI Interventions	Impact Estimates
$approx$$2–3B to market (est.)	Protocol Optimization AI	Reduced Development Cost
High attrition ($approx$90%) \| Digital Twins & Synthetic Arms \| Smaller Trials, Similar Power \| \| High daily burn ($approx$\$40K/day) \| Autonomous Agents \| Increased Automation \| \| CRO margins ($approx$15--25%) \| Direct Automation \| Margin Pressure \| \| Time to first patient ($approx$166d avg)	Parallel Processing AI	Faster Timelines

One way to make the economics concrete is to translate “cost to market” into a small set of dominant drivers: late-stage attrition, enrollment delays, protocol amendments, monitoring intensity, and the long tail of data cleaning. The question is not whether automation can eliminate scientific uncertainty, but whether it can shift these operational drivers enough to change the capitalized cost of development in a meaningful way. The table below summarizes the main levers discussed in this chapter in a format that mirrors how sponsors often reason about cost: what drives it today, what kind of automation or analytics is proposed, and what types of impact are plausibly expected.

Table 23.14: Potential AI Impact on Major Trial Cost Drivers (estimates based on industry reports; actual results depend on implementation) (Lamberti et al. 2024b; Getz 2014; McKinsey & Company 2024)

Cost Driver	Current State	AI Approach	Potential Impact
Failed Programs	~90% of drugs do not reach market	Protocol simulation, patient selection AI	Earlier go/no-go decisions; potentially fewer late-stage failures
Enrollment Delays	Most trials miss enrollment targets	NLP-based patient matching, predictive site selection	Faster enrollment (magnitude varies by indication)
Data Cleaning	Manual query resolution at $50-100/query	ML-assisted triage and draft resolutions	Reduced query burden
Monitoring Costs	9-14% of CRO budget on site visits	Risk-based + AI-detected anomalies	Reduced on-site visit frequency
Protocol Amendments	$500K+ per substantial amendment	Design simulation before finalization	Potentially fewer amendments

CROs currently command gross margins of 40-50% (with operating margins typically 6-16%) on top of direct costs (see Chapter 6). AI-enabled tools are increasingly capable of performing tasks that have traditionally justified those margins—document processing, query management, routine monitoring, and medical coding. As sponsors develop their own AI capabilities, they may demand either lower CRO fees or differentiated services that require domain expertise AI cannot replicate. The speed and extent of this shift remain uncertain.

The “daily burn” of operating a trial is largely a labor-and-coordination cost: monitors reviewing data, coordinators entering and reconciling information, medical writers preparing narrative sections, and project managers coordinating activities across teams and vendors. One way to analyze how automation might change this cost is to decompose spend by role and task category, recognizing that some activities are amenable to partial automation (for example, drafting or triage) while others remain inherently judgment- and accountability-driven. The table below provides an illustrative decomposition.

Table 23.15: Clinical Operations Roles and AI Automation Potential (author estimates based on (McKinsey & Company 2024))

Role	Current Cost Contribution	AI Automation Potential	Timeline
Data Manager	Query generation, cleaning	80% automatable	Now - 2027
Clinical Monitor	Source verification, oversight	50% automatable (remote SDV)	2025 - 2028
Medical Writer	Narratives, CSR sections	60% draft automation	Now - 2026
Project Manager	Status tracking, reporting	40% automatable	2026 - 2029
Regulatory Affairs	Submission compilation	70% automatable	2025 - 2027

In principle, if automation reduces repetitive coordination work and shortens the reconciliation tail, the operating cost per day could decline. The magnitude and reliability of any reduction depend on protocol complexity, data sources, and the quality of implementation and oversight.

The logistics challenges described in Chapter 18—multi-country execution, dozens of sites, long startup timelines, and tightly constrained supply chains—are coordination problems in a precise sense: progress depends on many interdependent tasks that are distributed across organizations, time zones, and systems of record. Delays rarely arise from a single missing document or a single late shipment; they arise from cascades, such as a contract delay that postpones site activation, which shifts enrollment curves, which changes drug demand forecasts, which increases the risk of stockouts or wastage. At the same time, oversight requirements impose constraints: actions must be traceable, exceptions must be reviewable, and accountability cannot be delegated to an opaque process.

In that environment, “agentic” automation is most defensible when it behaves like structured orchestration rather than free-form autonomy (Figure 23.8). The practical contributions are to maintain state across workflows, to triage and route exceptions, to generate standardized artifacts (for example, draft correspondence or document metadata), and to propose next actions that a responsible owner can approve. Used this way, orchestration can reduce coordination overhead by turning scattered operational signals into a prioritized work queue with clear provenance, while leaving high-stakes decisions—such as changes that affect participant safety, protocol interpretation, or regulatory commitments—under explicit human control.

flowchart LR
    subgraph Orchestrator["Trial Orchestration Agent"]
        Master[Master Coordinator]
    end
    
    subgraph Functional["Functional Agents"]
        Site[Site Activation Agent]
        Enroll[Enrollment Agent]
        Supply[Supply Chain Agent]
        Data[Data Quality Agent]
        Safety[Safety Monitoring Agent]
        Doc[Document Agent]
    end
    
    subgraph Actions["Autonomous Actions"]
        A1[Generate site contracts]
        A2[Match patients to criteria]
        A3[Predict inventory needs]
        A4[Resolve queries]
        A5[Draft safety narratives]
        A6[File TMF documents]
    end
    
    Master --> Site
    Master --> Enroll
    Master --> Supply
    Master --> Data
    Master --> Safety
    Master --> Doc
    
    Site --> A1
    Enroll --> A2
    Supply --> A3
    Data --> A4
    Safety --> A5
    Doc --> A6

Figure 23.8: Autonomous Agent Orchestration in Clinical Trials

Site activation—getting from protocol approval to first patient enrolled—currently averages 166 days for Phase III trials (Lamberti et al. 2024a). The delays are distributed across sequential steps: feasibility assessment, contract negotiation, IRB/EC submission, site training, and investigational product shipment. Each step has its own bottleneck, and the steps are typically processed in sequence rather than in parallel.

AI-enabled tools can address each bottleneck. Predictive site scoring, trained on historical enrollment data, can replace manual feasibility questionnaires. Contract generation from standardized templates can reduce legal negotiation cycles. Auto-populated submission packages can accelerate IRB preparation. Adaptive e-learning modules can replace in-person training for routine content. Predictive supply chain systems can coordinate IP shipment with site readiness.

The potential impact is not merely faster execution of each step but a shift from sequential to parallel processing—initiating multiple activation workstreams simultaneously rather than waiting for each to complete. The realized gains depend on integration quality and the extent to which sites and sponsors adopt common platforms.

Humans and organizations often process activation tasks sequentially. Software can execute some workstreams in parallel—such as assembling submission-ready document sets or preparing templated contract packets—while still requiring review, signature, and site-specific customization. Parallelization can reduce elapsed time, but the magnitude depends on local ethics processes, contracting norms, and the degree of platform standardization.

The rise of decentralized trials (DCTs) creates a logistics challenge of much greater scale: instead of shipping investigational product to a limited number of sites, sponsors may ship to thousands of patient homes and coordinate services across a wider set of vendors and geographies (U.S. Food and Drug Administration 2024). In practice, this pushes organizations toward greater automation and analytics, not because human oversight becomes unnecessary, but because routine coordination tasks and exception handling can otherwise overwhelm operational teams. The relevant question is therefore not “automation or not,” but which components can be standardized and monitored with traceable workflows while keeping safety- and compliance-critical decisions under accountable review.

Table 23.16: AI Solutions for Decentralized Trial Logistics

DCT Challenge	Traditional Approach	AI-Enabled Approach
Cold-Chain Monitoring	Periodic temperature logs	Real-time IoT + predictive intervention
Home Nursing Coordination	Manual scheduling	AI-assisted routing and scheduling
Patient Adherence	Site calls, diaries	Wearable + app AI detecting non-adherence patterns
Document Collection	Email reminders, faxes	TMF agents that chase, collect, and file automatically
Regulatory Compliance	Per-country manual review	Multi-jurisdiction compliance AI

Economics and Business Models

Sponsors face a strategic choice: invest in building AI capabilities, purchase from vendors, or partner with AI-native CROs. Each path has different economic implications:

Table 23.17: AI Adoption Strategies and Economic Trade-offs

Strategy	Upfront Investment	Ongoing Cost	Risk	Best For
Build Internal AI	$10-50M+	High (talent, compute)	Technology obsolescence	Top 20 pharma
Buy Platform AI	$1-5M licensing	Medium (per-trial fees)	Vendor lock-in	Mid-size sponsors
Partner with AI-Native CRO	Minimal	Higher per-trial cost	Dependency on partner	Biotech, small sponsors
Hybrid Model	$5-20M	Medium	Complexity	Most large sponsors

The 2030 Trial: A Vision

By 2030, a Phase III trial could look different in organizations that successfully integrate automation into validated workflows. The core idea is not that decision-making becomes “fully autonomous,” but that a larger fraction of routine coordination work becomes standardized, traceable, and partially automated: documents are assembled from structured sources rather than rewritten, operational signals are reconciled across systems of record rather than re-keyed into decks, and exceptions are routed with clear provenance rather than discovered late through periodic reconciliation.

This kind of operating model is easiest to imagine in contexts where three prerequisites are met. First, data capture and integration are mature enough that key events (screen failures, visit-window risk, temperature excursions, missing essential documents) appear as machine-readable signals. Second, organizations invest in validation, access controls, and audit trails so that automated steps remain inspectable and accountable under GCP and 21 CFR Part 11 expectations. Third, sites and vendors actually adopt the workflow, so that automation reduces burden rather than shifting it into more portals and alerts. With those constraints in mind, the scenario in Figure 23.9 is intentionally aspirational: it is meant to illustrate where cycle time and coordination cost might be reduced, not to predict a single inevitable trajectory.

timeline
    title The 2030 AI-Native Trial Timeline
    section Design (weeks)
        Protocol AI : Simulates large virtual cohorts
                    : Optimizes I/E criteria automatically
                    : Generates protocol document
    section Activation (weeks)
        Site AI : Identifies high-performing sites
               : Generates contracts, submits to IRBs in parallel
               : Ships IP with predictive inventory
    section Enrollment (months)
        Matching AI : Scans large EHR cohorts
                    : Alerts physicians of eligible patients
                    : Patients consent via eConsent platform
    section Conduct (months)
        Operations AI : Monitors data in real-time
                      : Generates and resolves queries automatically
                      : Predicts and prevents protocol deviations
    section Close (weeks)
        Reporting AI : Locks database with automated reconciliation
                     : Generates statistical outputs
                     : Drafts Clinical Study Report

Figure 23.9: The AI-Native Clinical Trial of 2030

Table 23.18: Scenario trial metrics in an AI-native operating model (illustrative; not a forecast)

Metric	2025 Benchmark	2030 Projection	Change
Time to First Patient	166 days	60 days	-64%
Enrollment Duration	500+ days	200 days	-60%
Daily Operational Cost	$40,000	$18,000	-55%
Total Phase III Cost	$100M	$45M	-55%
Human FTEs per Trial	50-100	15-30	-70%
Data Quality Issues	15-20% query rate	3-5% query rate	-75%

Industry and Workforce Implications

The impact of AI adoption varies by stakeholder because costs, responsibilities, and bargaining power are unevenly distributed across the clinical trial enterprise. Some organizations bear the largest absolute operational costs and therefore capture most of the direct savings from efficiency improvements; others earn revenue precisely from the labor-intensive activities that automation targets, making the same improvements economically disruptive. Differences in regulatory exposure also matter: actors responsible for compliance, inspection readiness, and patient protection face higher requirements for validation, audit trails, and accountability, which can slow adoption and increase the fixed costs of implementation.

In practice, the distributional effects depend on where automation is deployed (site workflows, sponsor oversight, CRO service delivery, or platform infrastructure) and on whether new tools reduce burden or simply shift it across organizational boundaries. The table below summarizes these incentives at a high level.

Table 23.19: Potential AI Transition Impact by Stakeholder (author assessment)

Stakeholder	Likely Position	Rationale
Large Sponsors	Positioned to benefit	Can afford AI investment; may capture efficiency gains if governance is effective
Biotech	Mixed	May benefit from lower trial costs, but less capital to invest in AI infrastructure
Traditional CROs	Facing pressure	Core services may be commoditized; business models may need to evolve
AI-Native CROs	Potentially advantaged	Built for automation; lower cost structure if quality can be demonstrated
Clinical Sites	Mixed	Less manual work, but also potentially less revenue per patient
Patients	Potentially benefit	May see faster access to trials, less burden, more decentralized options
Regulators	Adapting	Must develop AI validation frameworks while maintaining safety standards

The clinical research workforce—coordinators, monitors, data managers—faces significant disruption. Some work will shift toward roles focused on system implementation, validation, and oversight, while some routine tasks may be reduced through automation. The industry will likely need deliberate reskilling and role redesign to preserve domain expertise as workflows change.

The clinical trial technology ecosystem is moving in two directions at once. On the one hand, core systems are consolidating around a small number of platforms that can support regulated, end-to-end execution across EDC, CTMS, eTMF, and supply. On the other hand, sponsors are experimenting with a growing set of analytics and automation capabilities that sit “on top” of those systems and reshape day-to-day work in feasibility, monitoring, data cleaning, and documentation. The result is less a single wave of disruption than a gradual reallocation of investment: stability and standardization in the transactional backbone, with faster iteration at the analytics and workflow layer (Tufts Center for the Study of Drug Development 2024; Fortune Business Insights 2024):

Table 23.20: Clinical Trial Technology Market Projections (analyst estimates; subject to revision) (Tufts Center for the Study of Drug Development 2024; Fortune Business Insights 2024; MarketsandMarkets 2024)

Segment	2025 Market Size	2030 Projection	CAGR	Key Trend
Clinical Trial Software	$2.4B	$4.9B	15%	Cloud consolidation
eTMF Systems	$1.4B	$2.5B	13%	AI-powered automation
AI in Clinical Trials	$3.8B	$55B	46%	Rapid adoption
EDC Systems	$1.2B	$2.1B	12%	EHR integration

The IT stack has evolved from a passive repository to an active intelligence layer. When sponsors can integrate design, recruitment, and operations data in near real time, they can often detect operational risks earlier and reduce cycle time compared with fragmented legacy workflows.

Sponsors who adopt these technologies may be able to run trials faster and at lower operational cost, although the magnitude of improvement depends on disease area, protocol complexity, and the quality of implementation and governance.

A practical way to evaluate any “innovation” in this space is to ask three questions. First, what decision does it change, and what evidence shows that the change improves patient safety, data integrity, or development efficiency in the target context? Second, what are the failure modes, and how are they detected, documented, and corrected under GCP and inspection expectations? Third, how does it integrate into real workflows—including incentives across sponsors, CROs, and sites—so that it reduces burden rather than shifting it. Tools that answer those questions well are more likely to be adopted, defended, and sustained in regulated clinical research.