12 Study Design
Clinical trial design addresses a central problem in clinical development: how to turn a clinical question into decision-grade evidence while minimizing bias, confounding, and ambiguity. The design choices—population, endpoints, control, randomization, blinding, follow-up, and analysis plan—determine whether an observed effect can be credibly attributed to the intervention and interpreted for the intended regulatory and clinical use.
Every clinical trial begins with a question. Sometimes the question is simple: does this drug reduce blood pressure more than placebo? More often, the question is nuanced: does this drug reduce cardiovascular events more than existing therapy in patients with moderate-to-severe hypertension and high cardiovascular risk who are not adequately controlled on their current regimen?
When design is weak, trials can fail for reasons unrelated to the drug’s biology. For example, if eligibility criteria are defined too broadly and the endpoint is noisy or poorly aligned with the mechanism, the study may enroll a heterogeneous population in which any true benefit is diluted; the result can be an underpowered “negative” trial that is not persuasive for approval even if a meaningful effect exists in the intended subgroup.
12.1 Randomization
The cornerstone of the modern clinical trial is randomization—the random allocation of participants to treatment groups. This seemingly simple procedure is arguably the most important innovation in the history of clinical research.
The power of randomization lies in what it accomplishes: it ensures that treatment groups are comparable in both known and unknown characteristics. We know that age, sex, disease severity, and genetic factors can influence outcomes. But there are likely hundreds of factors we do not know about or cannot measure that also influence outcomes. Randomization, when performed correctly, creates groups that are balanced on all these factors—not in every individual case, but on average across studies.
This balance is what allows us to attribute observed differences to the treatment rather than to differences between the groups. If patients receiving the new drug do better than patients receiving placebo, and the groups were comparable at baseline, the most likely explanation is that the drug works.
Randomization comes in various forms. Simple randomization flips a coin for each patient—simple but potentially leading to imbalanced group sizes. Block randomization ensures that after every block of patients (say, 4 or 8), the groups are equal in size. Stratified randomization performs separate randomization within subgroups defined by important prognostic factors, ensuring balance on those factors.
12.2 Blinding
If randomization protects against bias in how patients are assigned to groups, blinding (also called masking) protects against bias in how they are treated and assessed. Table 12.1 summarizes the levels of blinding and their applications.
| Blinding Level | Who Is Blinded | Protection Against | Limitations |
|---|---|---|---|
| Open-label | No one | None | Risk of assessment and performance bias |
| Single-blind | Participants only | Participant expectation effects | Investigator may influence care/assessment |
| Double-blind | Participants + Investigators | Expectation + Assessment bias | May still be unblinded by side effects |
| Triple-blind | + Data analysts | All above + Analytic bias | Maximum protection |
| Blinded assessment | Outcome assessors only | Assessment bias when other blinding impossible | Limited to specific outcomes |
In a single-blind trial, participants do not know which treatment they receive, but investigators do. This prevents participant expectations from influencing subjective outcomes but does not protect against investigator bias in assessments or care.
In a double-blind trial, neither participants nor investigators know the treatment assignment. This is the gold standard for most clinical trials, eliminating both participant and investigator bias.
In a triple-blind trial, even the statisticians analyzing the data do not know which group is which until the analysis plan is executed—preventing any conscious or unconscious manipulation of the analysis.
Maintaining the blind is not always straightforward. Some drugs have distinctive side effects (certain cancer treatments cause hair loss; some psychiatric medications cause weight gain). Placebo formulations must be designed to be indistinguishable from active treatment in appearance, taste, smell, and texture. When blinding is not possible—as in surgical trials or many device studies—alternative designs and analyses must account for the limitations.
12.3 Control Groups
Clinical trials are controlled experiments, and choosing the right control is required.
Placebo controls compare the experimental treatment to an inert substance that is indistinguishable from the active treatment. Placebo-controlled trials provide the clearest evidence of efficacy because any difference between groups can be attributed to the pharmacological effect of the drug rather than to expectations, attention from healthcare providers, or natural fluctuations in disease.
However, placebo controls are not always ethical. When effective treatments exist, withholding them to demonstrate that a new drug is better than nothing may be unacceptable. The Declaration of Helsinki requires that new treatments generally be tested against the best current therapy, not placebo.
Active controls compare the experimental treatment to an established therapy. This is ethically appropriate when withholding treatment would be harmful, but it creates statistical challenges. Showing that a new drug is superior to an active control requires a larger sample size than showing it is better than placebo. Showing that a new drug is non-inferior—not meaningfully worse—requires careful definition of how much difference would be acceptable and raises concerns about trial quality that could mask real differences.
Historical and External Controls compare study participants to patients treated in the past or in separate real-world environments. While traditionally viewed with skepticism for pivotal trials, regulatory agencies are increasingly providing structure for their use in rare diseases and high unmet need areas. In 2025, both the MHRA and EMA released draft guidelines focused on the use of external control arms derived from real-world data, emphasizing the need for transparent validation plans and “fit-for-purpose” data to support regulatory decisions (Medicines and Healthcare products Regulatory Agency 2025).
Technically, these designs are being advanced by Digital Twin and Bayesian Borrowing frameworks. Digital twins—prognostic scores generated from high-dimensional baseline data—allow for “TwinRCTs” where each patient’s outcome is adjusted by their own AI-generated digital counterpart, significantly increasing power in longitudinal studies (Ross et al. 2024). Similarly, Bayesian dynamic borrowing allows sponsors to “borrow” information from historical trials or RWD while accounting for differences in patient populations, as demonstrated in recent case studies in first-line non-small cell lung cancer (NSCLC) (Struebing et al. 2024).
12.4 Parallel and Crossover Designs
Table 12.2 compares the major trial design types, each suited to different clinical questions and conditions.
| Design | Description | Advantages | Disadvantages | Best Used When |
|---|---|---|---|---|
| Parallel | Patients randomized to one treatment for study duration | Simple; No carryover concerns; Works for progressive diseases | Larger sample needed; Between-patient variability | Most confirmatory trials; Progressive conditions |
| Crossover | Each patient receives all treatments in sequence | Smaller sample; Patients serve as own controls | Carryover effects; Requires stable disease | Stable chronic conditions; PK studies |
| Factorial | Multiple interventions tested simultaneously | Tests interactions; Efficient for 2+ questions | Complex analysis; Interpretation challenges | Testing combinations; Multiple hypotheses |
| Cluster | Groups (sites, clinics) randomized, not individuals | Practical for system-level interventions | Reduced power; Complex analysis | Community interventions; Educational programs |
| Adaptive | Design modified based on interim data | Efficient; Smaller samples possible | Complex planning; Implementation challenges | Dose-finding; Rare diseases |
In a parallel design, participants are randomized to treatment groups and remain in those groups throughout the study. This is the most common design for confirmatory trials.
In a crossover design, each participant receives multiple treatments in sequence, serving as their own control. The advantage is efficiency: within-patient comparisons have less variability than between-patient comparisons, so smaller sample sizes may suffice. The disadvantages are complexity and the requirement that the condition be stable (the disease should not progress during the study) and that treatments have no lasting effects that carry over from one period to the next.
12.5 Sample Size
How many patients should be enrolled in a trial? The answer depends on several factors: the size of the effect we expect (or the minimum effect we consider clinically meaningful), the variability of the outcome measure, the Type I error rate we are willing to accept (the probability of declaring the drug works when it does not), and the power we require (the probability of detecting a real effect if it exists).
Larger effects are easier to detect than smaller ones. More variable outcomes require more patients to distinguish signal from noise. Lower acceptable error rates require larger samples. Higher power requirements—commonly 80% or 90%—require more patients than lower requirements.
In practice, sample size calculations are performed before the trial begins, based on assumptions about effect size and variability drawn from prior studies. If those assumptions prove incorrect—if the drug produces a smaller effect than expected, or if outcomes are more variable—the trial may be underpowered to detect a real effect. Adaptive designs that allow sample size re-estimation based on interim data address this concern.
12.6 Modern Master Protocols
Traditional trials test one drug in one disease. Master Protocols—basket, umbrella, and platform trials—are modern designs that test multiple therapies or multiple diseases under a single infrastructure, increasing efficiency.
Basket trials test one drug (or combination) across multiple diseases (e.g., lung, breast, colorectal) that share a common genetic mutation. Umbrella trials test multiple drugs against one disease, assigning patients to a specific arm based on their unique biomarkers. Beyond these, the Platform Trial represents a more perpetual infrastructure that can add or drop multiple treatment arms over time based on accumulating interim results, as exemplified by the I-SPY 2 study in breast cancer.
In Dec 2023, the FDA released updated guidance encouraging these designs to accelerate oncology and rare disease development.
flowchart TB
Pop((Cancer))
Arm1[Drug A]
Arm2[Drug B]
Arm3[Std of Care]
Pop --> Arm1
Pop --> Arm2
Pop --> Arm3
flowchart TB
DrugA(("Drug A "))
Dis1[Melanoma]
Dis2[Lung]
Dis3[Thyroid]
DrugA --> Dis1
DrugA --> Dis2
DrugA --> Dis3
In a basket trial, patients with different tumor types who share a common molecular target (e.g., BRAF V600E mutation) are enrolled together and treated with the same targeted therapy. In an umbrella trial, patients with a single disease (e.g., non-small cell lung cancer) are screened for multiple biomarkers and assigned to treatment arms based on their molecular profile.
12.7 The Statistical Analysis Plan
Before data from a clinical trial are unblinded, the statistical analysis plan (SAP) should be finalized. Modern SAPs must align with the ICH E9(R1) Estimands Framework, which requires precise definition of the treatment effect of interest before data is collected.
The estimand precisely defines the population being analyzed, the specific endpoint variable, and the strategies for handling intercurrent events—disruptions such as treatment discontinuation, death, or the use of rescue medication. Common approaches include the treatment policy strategy, which follows the intent-to-treat principle by using all data regardless of adherence; the hypothetical strategy, which estimates outcomes as if the drug had been taken as prescribed; and the composite strategy, which incorporates the intercurrent event directly into the definition of the endpoint itself.
The importance of pre-specifying the analysis cannot be overstated. A p-value of 0.05 means something only if the test was specified in advance.
12.8 The Protocol
All design decisions are documented in the protocol, the foundational document that governs how the trial will be conducted. A well-written protocol leaves little room for interpretation. It specifies exactly who can participate, what treatments will be given and how, what assessments will be performed and when, and how data will be collected and analyzed.
The protocol is not merely an internal document—it is reviewed by IRBs to assess participant protection, by regulatory authorities to assess scientific adequacy, and by investigators to understand their responsibilities. Once approved, deviations from the protocol are tracked and reported.
When design changes are needed after a trial has begun, they are implemented through protocol amendments. Major amendments—those affecting safety, the primary endpoint, or inclusion/exclusion criteria—must be approved by the IRB before implementation. Minor amendments may require only notification. All amendments should be carefully considered, as frequent changes can introduce operational challenges and raise questions about the scientific basis of the trial.