Written by Susan Miller*

Precision English for EMA Submissions: Crafting Benefit–Risk Narratives for AI-Enabled Drug–Device Interfaces (how to phrase benefit–risk in EMA context for SaMD)

Struggling to phrase AI-driven benefit–risk in EMA terms without overreach? In this lesson, you’ll learn to craft regulator-ready narratives for AI-enabled SaMD in drug–device combinations—anchoring EMA definitions, mapping evidence to claims, and applying a four-part template across RMP, PSUR, and device interfaces. Expect crisp explanations, annotated phrasing exemplars, and targeted exercises to lock in calibrated language and uncertainty handling. Leave with a reusable structure, harmonized terminology, and wording that shortens reviews and stands up to audit.

Step 1 – Anchor the EMA context and definitions

When drafting benefit–risk narratives for AI-enabled Software as a Medical Device (SaMD) embedded in a drug–device combination, start by aligning your language with how the European Medicines Agency (EMA) frames benefit and risk. EMA evaluates benefit as the positive therapeutic effect from the medicinal product under normal conditions of use, and risk as any undesirable effect or uncertainty that may impact patient safety or the benefit being realized. For AI-enabled SaMD, the software contributes to clinical decisions or device control that influence dosing, timing, or patient selection. This means the software’s function directly affects both the magnitude and the reliability of benefit. Consequently, your narrative must connect the software’s intended clinical effect to the medicinal product’s benefit while formally capturing software-specific risks and mitigations.

In combination products, claims and supporting evidence are distributed across several regulatory documents. Understand where each claim “lives” so your benefit–risk language remains consistent and auditable:

  • SmPC (Summary of Product Characteristics) and IB (Investigator’s Brochure): These anchor labeled claims and investigator-level rationale. The SmPC communicates intended use, indications, and key safety information for routine clinical practice; the IB supports clinical development phase understanding. For AI-enabled components, statements about the software’s role in dosing or patient identification should be conservative and consistent with validated indications.
  • RMP (Risk Management Plan): This is where you define important identified risks, important potential risks, and missing information. For AI, highlight data-dependence, generalizability constraints, algorithm updates, human factors, and cybersecurity as risk sources. Link each to specific risk minimization measures and pharmacovigilance activities.
  • PSUR (Periodic Safety Update Report): This document synthesizes post-marketing evidence on safety and effectiveness. For AI-enabled SaMD, include real-world performance metrics, drift signals, and corrective actions when model or data shifts are detected.
  • CER/CEP (Clinical Evaluation Report/Plan) and Notified Body interfaces (as relevant for the device component): These documents support conformity for the device/SaMD portion. Maintain alignment between device-side claims (performance and safety) and medicinal product-side claims (benefit–risk) to prevent inconsistency.

In this context, benefit means the clinical advantage gained because the AI-enabled software improves the correct use of the medicinal product (e.g., better dosing precision or improved patient selection). Risk includes clinical, technical, and organizational hazards introduced or modified by the software, such as misclassification, latency, degraded performance on new populations, human–AI interaction failures, or unsafe responses to rare edge cases. Because the AI contributes to decision-making, uncertainty—both epistemic (limits of knowledge) and aleatory (inherent variability)—must be explicitly managed. EMA expects traceability from claims to evidence, a balanced acknowledgment of limitations, and clear monitoring commitments.

A crucial aspect is transparency about the AI itself: describe model type (if appropriate), data provenance, intended operating conditions, known limitations, and the governance of updates. Avoid overstated claims; use controlled verbs and precise modality to signal the strength of evidence (e.g., “supports,” “is consistent with,” “demonstrates under specified conditions”). This calibrated phrasing helps regulators understand what is validated, what is reasonably anticipated, and what is still under observation.

Step 2 – Map evidence to claims

Your narrative should separate three tiers of evidence: analytical/technical performance, clinical performance, and clinical utility. Each tier demands distinct phrasing aligned to EMA expectations and must connect to risk and mitigations.

  • Analytical/technical performance: This covers whether the software performs its intended functions correctly under specified conditions. For AI-enabled SaMD, include metrics like accuracy, precision, sensitivity/specificity, latency, robustness to noise, security, and reliability under expected device constraints. Use bounded statements and conditions. Acceptable phrasing includes:

    • “Analytical performance demonstrates [metric] within [confidence intervals] under [defined environment].”
    • “Robustness testing indicates resilience to [specified perturbations] within predefined acceptance criteria.”
    • “Performance is maintained for [hardware/firmware version] and [software version] with [validation dataset characteristics].” Link these statements to device-level risk controls such as input validation, fail-safes, and cybersecurity measures. Emphasize repeatability, traceability of versions, and lock/unlock criteria for updates.
  • Clinical performance: This addresses whether the software outputs correlate with clinical reference standards or clinically accepted measures. For example, a model predicts a dosing range or flags patients with a specific risk profile. Use cautious, evidence-bound formulations that respect the validation study design:

    • “Clinical performance aligns with reference standard X, achieving [metric] in [population] under [study conditions].”
    • “Subgroup analysis indicates stable performance for [subgroups] with pre-specified non-inferiority margins met.”
    • “External validation demonstrates generalizability to [site/region] with no clinically meaningful degradation beyond acceptance thresholds.” Integrate limitations explicitly, especially if there are known performance gaps in underrepresented subgroups. Tie these to mitigations, such as user training, confirmation prompts, or the requirement for human oversight.
  • Clinical utility: This is the critical link to benefit. It asks whether using the software leads to improved clinical outcomes or safer use relative to current practice. EMA expects clarity on what outcome is affected and how strongly the evidence supports it. Suitable phrasing includes:

    • “Use of the AI-enabled interface is associated with [improved outcome], compared with [comparator], in [defined population], with [effect size] and [uncertainty bounds].”
    • “Evidence supports a reduction in [medication error rate/adverse event] attributable to [software function], under routine conditions reflected in [study design].”
    • “The observed benefit is contingent on adherence to [user training/protocol], with compliance rates of [x%] during evaluation.” Clinical utility statements must be paired with risk characterizations that describe residual risks, populations with unknown effects, and operational dependencies. Avoid general claims of clinical effectiveness without specifying design, comparator, and the conditions under which the effect was observed.

Across all tiers, actively include uncertainty. For AI, declare data distributions, versioning status (locked model vs permissible update), and governance around post-market modifications. When a claim depends on specific datasets, state that dependence. When real-world data collection is ongoing, specify what will be measured and how it will trigger corrective or preventive actions. This explicit map from evidence to claims and then to monitoring maintains regulatory coherence and supports lifecycle management.

Step 3 – Apply the 4-part narrative template

Use a consistent, auditable narrative scaffold. Each SaMD benefit–risk section can be constructed with four parts and controlled sentence stems.

1) Intended use

  • “The AI-enabled software component is intended to [describe function] within the [drug–device combination name], to support [clinical decision/process] for [target population] in [care setting].”
  • “The software operates under [specified conditions, inputs, and workflow], requiring [user role] oversight, and is not intended for [excluded use cases].”
  • “Use is limited to [indications consistent with SmPC], and interfaces with [device elements] per [version/compatibility matrix].”

2) Benefit statement with evidence

  • “When used as intended, the software supports realization of the medicinal product’s benefit by [mechanism of action at the decision level, e.g., dose individualization or patient selection].”
  • “Analytical performance meets predefined acceptance criteria for [metrics], established using [datasets/environments].”
  • “Clinical performance is consistent with [reference standard/comparator] in [population], with [metrics and confidence intervals].”
  • “Clinical utility evidence indicates [improvement in outcome or safety measure], compared with [comparator], under [study conditions], with [effect size] and [uncertainty bounds].”
  • “Benefit depends on adherence to [training/protocol] and [system prerequisites], with observed compliance of [x%].”

3) Risk characterization with mitigations

  • “Identified risks include [misclassification/over-reliance/human factors/cybersecurity/data drift], arising from [root causes such as data shift or UI complexity].”
  • “Important potential risks include [generalizability gaps/rare edge cases], especially in [underrepresented subgroups] where data are limited.”
  • “Risk minimization measures comprise [user training, confirmation steps, thresholds, alarms, fallbacks], and technical controls such as [input validation, fail-safe modes, version control].”
  • “Pharmacovigilance activities and device surveillance will monitor [key performance indicators, error rates, subgroup performance, drift signals], with predefined triggers for [corrective action/labeling update/additional training].”
  • “Residual risk remains for [specified scenarios], communicated to users via [SmPC/IFU training materials], with instructions for [human oversight and clinical judgment].”

4) Overall conclusion with uncertainty and monitoring

  • “Overall, the benefit–risk balance is considered [favorable/positive] for the intended population and conditions of use, based on [summarize strength of evidence].”
  • “Uncertainties persist regarding [data-limited subgroups, long-term performance, rare events], which are addressed through [post-market studies, registry data, periodic reviews].”
  • “The software is subject to [change control and version governance], with communication procedures for [user notification, documentation updates, retraining requirements].”
  • “PSUR and RMP will be updated with [pre-specified metrics and findings], ensuring ongoing alignment with EMA expectations for lifecycle safety and effectiveness.”

This template ensures that the software function is properly linked to the medicinal product’s outcome claims and that risks are framed within EMA’s consistent categories, with explicit mitigations and monitoring.

Step 4 – Adapt for key EMA documents

While the structure remains stable, the emphasis and phrasing should shift to fit specific EMA documents and interactions. Use concise, controlled language that mirrors each document’s purpose.

  • RMP sections (focus on risk definitions and controls)

    • Intended use: “The AI-enabled module supports [clinical decision] within [indication]. Use requires [user role] oversight.”
    • Risks: “Important identified risks: [list]. Important potential risks: [list]. Missing information: [populations/data contexts].”
    • Mitigations: “Routine risk minimization: [labeling/training]. Additional risk minimization: [restricted access/decision thresholds]. Pharmacovigilance: [specific monitoring, drift detection, subgroup audits].”
    • Monitoring: “Predefined signals include [KPI thresholds]. Corrective actions include [model rollback/update freeze/user communication].”
    • Do: tie each risk to a concrete mitigation and a measurable signal. Don’t: claim risk elimination; communicate risk reduction and residual risk explicitly.
  • PSUR updates (focus on post-market evidence and trends)

    • Performance summary: “Post-authorization data show [metrics] with [confidence intervals], consistent with pre-authorization performance.”
    • Safety/benefit signals: “Observed changes in [error rate/adverse events] temporally associated with [software version change or data shift]. Root cause analysis indicates [factor]. Corrective actions: [details].”
    • Subgroup insights: “No clinically meaningful degradation in [subgroups], except [group], where performance fell below [threshold], prompting [mitigation].”
    • Benefit reinforcement: “Real-world utility remains consistent with [effect size], after adjustment for [confounders].”
    • Do: include version timelines and data provenance. Don’t: aggregate across versions without stratification.
  • Briefing notes/teleconference phrasing (focus on clarity and auditability)

    • “Our claims are limited to [specific outcome] under [conditions].”
    • “Clinical utility is supported by [study type], with [effect size] and [limitations].”
    • “We maintain a locked model policy for [duration], with any updates subject to [validation and notification].”
    • “We monitor [KPIs] continuously and will report deviations >[threshold] in the next PSUR or via [signal channels].”
    • Do: speak in short, verifiable statements tied to documents. Don’t: imply adaptive self-improvement without governance.
  • CER/CEP and Notified Body interactions (alignment with device performance and safety)

    • “Analytical performance is verified against [standards], with stress testing for [conditions].”
    • “Clinical performance aligns with [reference], with predefined acceptance criteria met.”
    • “Human factors validation confirms usability with [user roles], addressing [error-prone steps] via [UI controls].”
    • Do: cross-reference medicinal product claims without expanding them. Don’t: introduce device claims that imply medicinal efficacy beyond the SmPC.

Across all documents, use controlled verbs and modality to calibrate certainty:

  • Strong evidence: “demonstrates,” “meets predefined criteria,” “is consistent with.”
  • Moderate evidence: “supports,” “is associated with,” “suggests under specified conditions.”
  • Conditional/limited generalizability: “is expected to,” “may vary depending on,” “has not been established in.”

Include transparency cues that are specific to AI behavior:

  • Data dependence: “Performance estimates are contingent on data distributions similar to [training/validation sets].”
  • Generalizability: “External validation indicates stable performance across [sites], with caution for [underrepresented groups].”
  • Model changes: “Version [x.y] remains in use; any update triggers revalidation per [SOP] and user communication within [timeframe].”

Finally, make your narrative traceable. Explicitly link claims to evidence tables, protocol identifiers, and dataset descriptors. Note where each statement will be maintained or updated (SmPC, RMP, PSUR), and keep all phrasing aligned across documents. This disciplined approach assures regulators that your benefit–risk narrative for AI-enabled SaMD within a drug–device combination is precise, balanced, and auditable throughout the product lifecycle.

  • Align all benefit–risk language with EMA definitions, linking the AI software’s intended clinical effect to medicinal product benefit and explicitly capturing software-specific risks, limitations, and governance of updates.
  • Map claims to three evidence tiers—analytical/technical, clinical performance, and clinical utility—using bounded, condition-specific phrasing, explicit uncertainty, and clear ties to risk controls and mitigations.
  • Use a consistent 4-part narrative (intended use; benefit with evidence; risks with mitigations and monitoring; overall conclusion with uncertainties and lifecycle governance) to ensure traceability and auditability.
  • Adapt emphasis and calibrated verbs per document (RMP, PSUR, CER/CEP, briefing notes), stratify results by model/version, and maintain alignment across documents with transparent data provenance and version control.

Example Sentences

  • Clinical utility evidence indicates a reduction in dosing errors attributable to the AI-enabled titration module, with an absolute risk reduction of 3.1% under routine outpatient conditions.
  • Analytical performance demonstrates AUC 0.91 (95% CI: 0.88–0.94) for patient selection on version 2.3 when used with firmware 1.7 and validated sensor inputs.
  • Identified risks include misclassification due to data drift and over-reliance by novice users, mitigated through drift detection thresholds and mandatory confirmation steps.
  • External validation is consistent with pre-authorization performance in EU sites, with no clinically meaningful degradation beyond predefined acceptance limits in older adults.
  • Overall, the benefit–risk balance is considered favorable for the indicated population; uncertainties remain for pregnancy and rare genetic subgroups and will be addressed via registry follow-up and PSUR updates.

Example Dialogue

  • Alex: For the RMP, keep the claim narrow: “The AI module supports dose individualization under physician oversight,” and list data drift and human–AI interaction errors as important identified risks.
  • Ben: Got it; I’ll tie each risk to a mitigation—confirmation prompts, version control, and predefined KPI triggers for rollback.
  • Alex: In the PSUR, say, “Real-world performance is consistent with pre-authorization metrics,” and stratify by model version to avoid masking degradation.
  • Ben: And for briefing notes, I’ll add, “Our claims are limited to reduced medication errors under specified workflow conditions; any update follows a locked-model policy with revalidation.”

Exercises

Multiple Choice

1. Which statement best aligns with EMA-calibrated phrasing for analytical/technical performance in an AI-enabled SaMD within a combination product?

  • The software proves it works perfectly in all real-world settings.
  • Analytical performance demonstrates sensitivity and specificity within predefined acceptance criteria under validated environmental conditions.
  • Clinical utility confirms that the AI cures the condition in most patients.
  • The model self-improves over time without governance, ensuring ongoing benefit.
Show Answer & Explanation

Correct Answer: Analytical performance demonstrates sensitivity and specificity within predefined acceptance criteria under validated environmental conditions.

Explanation: EMA expects bounded, evidence-linked claims for analytical performance. Phrasing should specify metrics, acceptance criteria, and defined conditions; it must avoid overstatements like “perfectly” or governance-free adaptation.

2. Where should data drift signals and version-stratified real-world metrics primarily be summarized post-authorization?

  • SmPC
  • Investigator’s Brochure (IB)
  • Periodic Safety Update Report (PSUR)
  • Clinical Evaluation Plan (CEP)
Show Answer & Explanation

Correct Answer: Periodic Safety Update Report (PSUR)

Explanation: PSUR synthesizes post-marketing evidence, including performance trends, drift signals, and corrective actions, and should stratify by model version to avoid masking degradation.

Fill in the Blanks

In the RMP, important identified risks for AI-enabled SaMD should include ___ and human–AI interaction failures, each linked to specific risk minimization measures.

Show Answer & Explanation

Correct Answer: data drift

Explanation: The explanation emphasizes listing AI-specific risks such as data dependence and drift in the RMP, tied to concrete mitigations and pharmacovigilance activities.

Clinical utility statements must specify the affected outcome, comparator, and conditions, and include the ___ bounds to reflect uncertainty.

Show Answer & Explanation

Correct Answer: uncertainty

Explanation: EMA expects calibrated claims with explicit uncertainty (e.g., confidence intervals or credible intervals) for clinical utility to avoid overstatement.

Error Correction

Incorrect: Analytical performance proves the AI always meets criteria across all populations and versions without conditions.

Show Correction & Explanation

Correct Sentence: Analytical performance demonstrates predefined metrics within acceptance criteria under specified conditions and versions.

Explanation: Claims must be bounded and conditional, referencing defined environments and versioning; avoid universal, unconditional assertions like “always.”

Incorrect: Our claims show the AI reduces adverse events in general practice, and future updates will auto-deploy without revalidation.

Show Correction & Explanation

Correct Sentence: Our claims are limited to reduced medication errors under specified workflow conditions, and any update follows change control with revalidation and user notification.

Explanation: Phrasing should be precise about scope (“specified workflow conditions”) and reflect governance over updates (change control, revalidation, communication), consistent with EMA expectations.