Written by Susan Miller*

Published on: Sep 30, 2025

Choosing Your Format for High-Stakes Delivery: Given When Then vs Bullet Acceptance Criteria

Shipping under audits and SLAs but unsure whether to write scenarios or a checklist? This lesson shows you exactly when to use Given–When–Then versus Bullet Acceptance Criteria—and how to convert between them without losing precision. You’ll get a crisp decision guide, high-signal examples, and targeted exercises (MCQs, fill‑ins, and corrections) to lock in specificity, measurability, and testability. Finish able to author hybrid criteria that map cleanly to tests, monitoring, and go/no‑go gates.

Step 1 — Orient to the decision: Why format matters and what “good” looks like

In high-stakes delivery, acceptance criteria are not just notes on a ticket; they are the enforceable contract for expected behavior. They anchor what the system must do, how it must do it, and under what precise conditions success will be declared. Because they underpin your test plans, monitoring, and ultimately your KPIs and SLOs, any vagueness in acceptance criteria will propagate. It will dilute test coverage, create interpretive gaps during development, and weaken your ability to audit outcomes. When a release must survive compliance audits, customer SLAs, or rigorous incident review, clarity and precision are non-negotiable.

Two widely used formats help teams capture this contract clearly. The first is Given–When–Then (GWT), a Behavior-Driven Development syntax that describes the behavior of a system as scenarios. GWT expresses three core parts: preconditions (Given), the triggering action or event (When), and the observable outcome (Then). The format also allows additional lines using “And” to extend any of these parts. Because it narrates state, trigger, and outcome, GWT helps reveal missing assumptions and drives explicit thinking about inputs and edge cases. The second format is Bullet Acceptance Criteria (BAC), an itemized checklist of verifiable conditions. BAC aims for concision and scan-ability; each bullet is a discrete, testable condition and is often numbered to support traceability and prioritization. Unlike GWT, BAC is not inherently narrative; it emphasizes completeness and measurability across conditions, thresholds, and cross-cutting constraints.

Regardless of which format you choose, a shared quality bar applies. Effective acceptance criteria are:

Specific: Avoid generalities; name actors, system components, data, and interfaces.
Observable: Only include outcomes that can be seen or measured from outside the implementation.
Measurable: Use quantitative thresholds or exact outputs, not adjectives like “fast,” “secure,” or “reliable.”
Atomic: Each item captures one idea. Splitting reduces ambiguity and simplifies testing.
Independent: As much as possible, each item stands alone for pass/fail evaluation.
Consistent with constraints: Align with architectural, regulatory, and operational limits.
Traceable: Link each criterion to its source requirement and the metrics that will verify it.

Holding both GWT and BAC to this bar ensures that your criteria can survive scrutiny, support test automation, and map cleanly to monitoring and audit artifacts.

Step 2 — Compare formats: Given–When–Then vs Bullet Acceptance Criteria

The two formats serve overlapping but distinct needs. Understanding their strengths and limits will help you choose the right tool for each problem.

GWT strengths include narrative clarity for stateful flows, complex rules, and edge cases. Because GWT explicitly separates preconditions, triggers, and outcomes, it reduces ambiguity about what must be true before an action, what event occurs, and what a stakeholder should observe afterwards. This structure maps directly to executable test scenarios, encouraging teams to think in examples and data variations. When behavior depends on roles, feature flags, time windows, or input permutations, GWT makes those dependencies explicit. It also facilitates collaboration between product, QA, and engineering because the scenario language resembles natural speech while remaining precise.

The limits of GWT show up when complexity grows combinatorially. For features with many permutations of roles, inputs, and states, scenarios can proliferate, making the set difficult to manage and maintain. This “scenario explosion” can obscure the global rules and slow down review. Further, GWT is not ideal for non-functional requirements (NFRs) that span scenarios, such as performance thresholds, logging obligations, data retention rules, accessibility conformance, or security controls. While you can attach such constraints to scenarios, they are harder to scan and enforce consistently when distributed across many scenario descriptions.

BAC strengths center on speed of comprehension and coverage of cross-cutting constraints. A numbered list of conditions is easy to read, prioritize, and map to tests. BAC excels at capturing NFRs, acceptance thresholds, compliance rules, and policies that apply across many behaviors, such as latency SLOs, error budgets, authentication standards, or audit logging fields. Because BAC is condensed, it is well suited to reviews with stakeholders who need to validate that every constraint is represented without reading long narratives.

BAC has limits of its own. Preconditions can remain implied rather than stated, which creates hidden assumptions. Without explicit context, implementers may misinterpret the scope of a bullet or its applicability across states and roles. Additionally, for complex stateful flows, BAC is less expressive: describing state transitions, branching outcomes, and edge behaviors in bullets can become awkward and ambiguous. In such cases, bullets risk collapsing into vague statements that are hard to test reliably.

A practical decision guide helps you select a format:

Choose GWT when the user journey and trigger matter; outcomes depend on state, role, or inputs; you want executable, example-driven scenarios; or you are defining “what happens when …” for a flow.
Choose BAC when you are stating universal rules, thresholds, or one-off verifiable conditions; you need a concise checklist for stakeholders; or you are capturing NFRs such as latency, availability, security, and audit obligations.
Consider a hybrid: use BAC to declare global constraints and KPIs once, and GWT to illustrate the critical flows that must operate under those constraints. This keeps NFRs centralized and scenarios focused.

Step 3 — Write and convert: patterns and pitfalls

When writing acceptance criteria, decide early whether the problem is primarily about behavior under conditions (favor GWT) or about satisfying a set of universal constraints (favor BAC). Regardless, keep your language disciplined and test-oriented. Start by identifying the actors (users, services), the relevant state (role, data presence, feature flags), the trigger, and the expected observable outcomes or thresholds. Then select the format that best surfaces those elements without overcomplication.

If you begin with GWT, the scenario should clearly express the precondition, the trigger, and the outcome in concrete terms. Because GWT invites example thinking, it encourages you to specify roles, data sets, and the exact signals that confirm success or failure. When you convert a scenario into BAC, strip each Then/And line into a single, testable bullet, and impose measurable thresholds where needed. A key skill is recognizing where GWT’s narrative hides an implicit threshold (for example, speed) and making it explicit in BAC with numeric bounds. This distillation helps build checklists that testing and operations teams can verify consistently.

If you begin with BAC, each bullet should be verifiable with a single pass/fail outcome and avoid bundling multiple ideas. For bullets where behavior is state-dependent or especially error-prone, translate them into at least one GWT scenario. This conversion deepens clarity: preconditions once implied become “Given” lines; triggers once ambiguous become “When”; outcomes become “Then,” aligned with observable signals such as response codes, UI changes, or emitted logs. This back-and-forth fosters a shared understanding and reveals gaps like missing actors or undefined failure responses.

While drafting either format, watch for common language pitfalls:

Avoid vague verbs such as “supports,” “handles,” “improves,” “fast,” or “secure.” Replace them with measurable, observable statements like “returns HTTP 403,” “persists the record,” “renders in ≤ 200 ms at P95,” or “encrypts at rest with AES-256 and in transit with TLS 1.3.”
Do not hide the actor. Always specify who or what performs the action: “premium user,” “billing service,” “admin API,” or “scheduled job.” This prevents cross-team misreadings.
Do not bundle multiple conditions into one bullet or one Then/And line. Compound statements hide failure modes and undermine testability. Split them into atomic, independent items.
Do not leave preconditions implied. In BAC, if assumptions are global (for example, “user is authenticated,” “data exists”), state them in a short preface or scope statement so reviewers and testers understand the boundaries.

Disciplined wording makes the difference between criteria that guide delivery and criteria that generate debate and rework. Your aim is to express outputs and boundaries that anyone can verify without reading your mind.

Step 4 — Align to success metrics and test plans

Acceptance criteria must connect directly to success metrics and the tests that will verify them. In high-stakes delivery, that alignment ensures that your criteria are not just theoretical but operationally enforceable.

For GWT, reference measurable success metrics within the outcomes when relevant. If performance matters, include a threshold in the “Then” clause tied to a percentile (e.g., P95). If the response code or audit log is the signal of success, name it precisely. Doing so transforms scenarios into ready-made functional test cases and, in many teams, executable BDD tests. Positive, negative, and edge-case scenarios can be derived by varying the Given (state), the When (trigger), or the data inputs. Feature flags, permission levels, and temporal constraints (such as month boundaries or token expiry) should also be modeled in scenarios so that your coverage mirrors real-world usage.

For BAC, codify your KPIs, SLOs, and compliance controls explicitly. If you need an error rate below a threshold during rollout, state it. If zero P1 security defects is a hard gate, include it. Performance, availability, and security constraints fit naturally in BAC because each bullet represents a conformance criterion. From BAC, derive non-functional tests: load tests that confirm latency thresholds, security tests that validate access control and encryption, accessibility audits that check WCAG conformance, and observability checks that confirm logs, metrics, and traces are emitted with the required fields and retention policies. BAC items also map cleanly to monitoring and alert thresholds for canary or phased rollouts.

To maintain rigor, use a brief mini-rubric to self-check every acceptance criterion, regardless of format:

1) Is it testable with a single pass/fail outcome? If not, split it. 2) Is it measurable with explicit thresholds or exact outputs? If not, quantify it. 3) Is it scoped with clear preconditions and actors? If not, add them. 4) Is it traceable to a requirement and a metric? If not, link it. 5) Is it written in concise, professional English suitable for an RFC or design document? If not, tighten the language.

This rubric raises the quality bar while preserving speed. It also prepares criteria for audit and incident review because each item stands as a defensible, verifiable statement.

Putting it all together in practice

Choosing between GWT and BAC is not about preference; it is about fitness for purpose. If your goal is to describe “what happens when” with explicit state and triggers, GWT provides narrative clarity and naturally yields functional test cases. If your goal is to define universal conditions, thresholds, or compliance rules that apply across many behaviors, BAC offers a concise, scannable checklist and a direct line to non-functional testing and monitoring.

In high-stakes delivery, the formats often work best together. Use BAC to define global constraints, SLOs, and cross-cutting rules. Then capture the most critical and error-prone flows as GWT scenarios that operate under those constraints. Convert between the two as needed: extract bullets from Then/And statements to consolidate thresholds, and expand bullets into scenarios where behavior is stateful or risk is high. Throughout, police your language for specificity, observability, measurability, and atomicity. Keep actors named, preconditions explicit, and outcomes tied to metrics.

When you do this consistently, acceptance criteria become a reliable bridge from requirements to code, tests, and operations. They preserve intent through implementation, reduce ambiguity in handoffs, and give your team a firm basis for go/no-go decisions. Most importantly, they make behavior testable, unambiguous, and auditable—the three essentials for delivering with confidence under pressure.

Use Given–When–Then (GWT) for stateful flows where preconditions, triggers, and observable outcomes matter; use Bullet Acceptance Criteria (BAC) for universal, measurable rules, thresholds, and compliance constraints.
Hold all criteria to a strict quality bar: be specific, observable, measurable, atomic, independent, consistent with constraints, and traceable to requirements and metrics.
Convert between formats as needed: expand ambiguous BAC bullets into GWT scenarios for state-dependent behavior, and distill GWT Then/And outcomes into numbered BAC bullets with explicit thresholds to avoid scenario explosion.
Tie every criterion to verification: include precise actors, preconditions, metrics (e.g., HTTP codes, P95 latency), and use the mini-rubric to ensure each item is single-pass/fail, quantified, scoped, traceable, and clearly written.

Example Sentences

Choose Given–When–Then when behavior depends on state, role, or inputs, and use Bullet Acceptance Criteria for universal thresholds.
Given a premium user with a valid token, When they request the invoice PDF, Then the API returns 200 and the file downloads within 2 seconds at P95.
Bullet Acceptance Criteria should be specific, observable, measurable, atomic, independent, and traceable to requirements and metrics.
When scenarios explode due to many permutations, convert narrative Then lines into numbered bullets with explicit thresholds.
Use a hybrid: state global latency, security, and audit rules in BAC, and express critical checkout flows in GWT under those constraints.

Example Dialogue

Alex: We need acceptance criteria for the new payout feature—should we write GWT or bullets?

Ben: Start with GWT for the flow. Given a verified vendor, When they request payout, Then we return 200 and show the scheduled date.

Alex: Good. But our compliance team wants performance and logging rules too.

Ben: That’s BAC territory. Let’s add bullets like “P95 payout response ≤ 300 ms” and “Audit log records vendor_id, request_id, and timestamp.”

Alex: Perfect. We’ll keep the scenarios for edge cases, like expired tokens, and the bullets for cross-cutting constraints.

Ben: Exactly—hybrid approach: narrative for behavior, checklist for thresholds and policies.

Exercises

Multiple Choice

1. Which format is best for capturing cross-cutting non-functional requirements like latency SLOs and audit logging fields?

Given–When–Then (GWT)
Bullet Acceptance Criteria (BAC)
User Stories
ER Diagrams

Show Answer & Explanation

Correct Answer: Bullet Acceptance Criteria (BAC)

Explanation: BAC excels at universal, measurable constraints (e.g., latency thresholds, security controls, audit logging) because bullets are concise, testable, and easy to scan.

2. You have many role/input permutations causing scenario explosion. What should you do according to the guidance?

Write more detailed GWT scenarios for every permutation
Convert Then/And lines into numbered bullets with explicit thresholds
Remove most edge cases to keep scenarios short
Use only adjectives like “fast” to summarize performance

Show Answer & Explanation

Correct Answer: Convert Then/And lines into numbered bullets with explicit thresholds

Explanation: When GWT scenarios proliferate, convert narrative outcomes into BAC bullets with measurable thresholds to improve manageability and coverage.

Fill in the Blanks

Given an authenticated admin, When they request the user export, Then the API returns ___ and the file is generated within ≤ 5 seconds at P95.

Show Answer & Explanation

Correct Answer: 200

Explanation: Observable, measurable outcomes should be explicit. A precise HTTP status (200) and a quantified performance threshold meet the specificity and measurability criteria.

Use a hybrid approach: declare global security and performance rules in , and capture critical flows with .

Show Answer & Explanation

Correct Answer: BAC; GWT

Explanation: Global, cross-cutting constraints belong in BAC, while stateful flows and “what happens when …” scenarios are expressed in GWT.

Error Correction

Incorrect: Given a user, When they submit the form, Then it is fast and secure.

Show Correction & Explanation

Correct Sentence: Given a verified user with a valid token, When they submit the form, Then the API returns 201 and encrypts data at rest with AES-256 and in transit with TLS 1.3, with response time ≤ 300 ms at P95.

Explanation: The original uses vague adjectives (“fast,” “secure”). The correction specifies actors, observable outputs (201), and measurable/security details, aligning with specificity and measurability.

Incorrect: BAC: 1) Handle errors and success together; 2) Logging; 3) Performance is good.

Show Correction & Explanation

Correct Sentence: BAC: 1) On validation failure, return HTTP 422 with error_code and field list. 2) Audit log includes user_id, request_id, and UTC timestamp. 3) P95 response time ≤ 300 ms during business hours (09:00–18:00 UTC).

Explanation: Each bullet must be atomic, observable, and measurable. The correction splits conditions, specifies outputs/fields, and quantifies performance instead of using vague terms.