Writing test cases is where most QA time goes — and where time pressure does the most damage. This module shows how to generate comprehensive, well-structured test cases from requirements in a fraction of the usual time, with better edge case and boundary coverage than manual writing alone typically achieves.
What makes a good test case — before we generate anything
AI can generate test cases fast. Whether those test cases are good depends on the quality of what you ask for — and your ability to review the output against what a well-written test case actually requires. Before using AI to produce test cases at scale, it's worth being precise about what distinguishes a useful test case from a time-wasting one.
❌ Weak test case
Title: Test auto quote functionality
Steps: 1. Open PolicyCenter. 2. Create a new auto quote. 3. Verify it works.
Expected result: Quote is created successfully
Problems: No specific test data. "Works" is undefined. No specific expected values. Not repeatable. Won't catch any specific defect.
✓ Effective test case
Title: Auto quote — Ontario driver, clean record, standard territory, verifies base premium calculation
Pre-conditions: PolicyCenter SIT environment, test driver profile QA-ON-001 (Ontario licence, DOB 1985-03-15, 0 convictions)
Steps: 1. Initiate new personal auto quote. 2. Enter driver QA-ON-001 details. 3. Select coverage: $1M liability, $500 collision deductible, comprehensive included. 4. Submit for rating. 5. Verify calculated premium against rating table RT-AUTO-ON-2026.
Expected result: Base premium = $1,247/year ± $5 (rounding tolerance). Territory code = ON-07. No surcharges applied.
The effective test case is specific, uses defined test data, produces a verifiable expected result, and is repeatable by any team member. When it fails, you know exactly what failed and why. When AI generates test cases, this is the quality bar you're reviewing against — and it's why the review step is non-negotiable.
Knowledge Check
AI generates 40 test cases for a PolicyCenter billing integration. You review them and find that 15 reference "the expected payment amount" without specifying what that amount is, and 8 have expected results that say "system behaves correctly." What should you do?
Correct. Vague expected results aren't a minor formatting issue — they make test cases non-executable and useless as defect evidence. "System behaves correctly" tells you nothing when testing finds it doesn't. "Expected payment amount" without a value means different testers will verify different things. Fix these before the cases go into the suite — get the specific values from the billing specification or from the business rules confirmed in elicitation. This is the review work that AI cannot do for you: populating expected results with values that come from your project knowledge.
Option 1 is the correct approach. Vague expected results must be fixed before the cases enter the suite — they're not a documentation nicety, they're the difference between a test and a wishful activity. "The tester will know" creates inconsistency between testers and produces unusable defect evidence. Removing the cases loses potentially important scenarios that just need specific expected values added. AI-generated expected results often need population with specific values from project specifications — that's not AI's failure, it's the nature of what needs to come from your project context.
2
Generating test cases with AI — the production prompt
The prompt structure for test case generation is more specific than general BA prompting because test cases have precise structural requirements. The more context you give AI about the system under test, the business rules in play, and the test data constraints, the more usable the output will be. Here's the pattern that works in Guidewire insurance delivery contexts.
Prompt — test case generation for a Guidewire workflow
Role / context
I'm a QA engineer writing test cases for a Guidewire ClaimCenter implementation. The feature under test is the First Notice of Loss (FNOL) intake workflow for personal auto claims at an Ontario insurer.
Task
Generate a comprehensive set of test cases for the FNOL intake workflow. Cover: the happy path (successful intake completion), validation scenarios (missing required fields, invalid date of loss, future-dated loss), business rule scenarios (injury flag triggering specialist queue assignment, vehicle undriveable triggering towing auth), and negative scenarios (duplicate claim detection, lapsed policy at time of loss).
Business rules confirmed in design
Claims with any injury indicator must be assigned to the injury specialist queue, not the general adjuster pool. Vehicle undriveable flag triggers automated towing authorisation up to $350 — above $350 requires supervisor approval. Date of loss cannot be in the future or more than 3 years in the past. Duplicate detection: if same policy number + same date of loss already exists in ClaimCenter, system should warn but allow override with supervisor note. Lapsed policy at date of loss: claim shell still created but status = "Coverage Review Required."
Format
Each test case: ID, title, pre-conditions, test steps (numbered), expected result (specific and observable). Group by scenario type. For expected results involving specific system states (queue assignments, status values, approval thresholds), use the exact field names and values I've provided. Flag where test data needs to be set up in advance.
The key ingredient — business rules
Notice what made this prompt produce useful output: the specific business rules were included in the context. The $350 towing threshold, the 3-year lookback limit, the "Coverage Review Required" status for lapsed policies — these are project-specific details that came from your elicitation work. AI knows how to structure a test case; you know what the system is supposed to do. The combination produces test cases that are actually executable against your specific implementation.
After generating 30–40 test cases from this prompt, your review covers four things: verifying every expected result has a specific, observable value; checking that test data requirements are feasible in your test environment; adding any scenario the AI missed that you know from your project context; and removing any case that duplicates another or tests something clearly out of scope for this cycle.
3
Edge cases and boundaries — where defects hide
The vast majority of production defects in insurance systems don't come from the happy path. They come from boundary conditions, edge cases, and the interaction between features that individually work fine. This is where experienced QA intuition matters most — and where AI provides the most leverage for less experienced testers who haven't developed that pattern recognition yet.
AI is specifically good at boundary value analysis and equivalence partitioning — the systematic techniques for identifying edge cases that testing theory defines but time pressure usually prevents from being applied comprehensively.
Edge case generation — PolicyCenter rating boundaries
The requirement: "Vehicle age affects the collision and comprehensive coverage rate. Vehicles over 10 years old are rated differently from vehicles 10 years and under."
What a typical test suite covers: A 5-year-old vehicle (clearly under 10), a 15-year-old vehicle (clearly over 10). Two test cases.
What AI generates when asked for boundary analysis:
Exactly 10 years old (boundary — which rate applies?). 10 years and 1 day old (just over boundary). Vehicle with unknown manufacture date. Vehicle manufactured in the current year. Vehicle with a disputed year (different on UVIP vs driver's statement). Vehicle where policy effective date crosses the vehicle's 10th anniversary (mid-term boundary). Classic car designation that overrides standard age rating. Write-off vehicle re-registered with new year.
The result: Eight additional scenarios, several of which reveal ambiguity in the business rule that must be resolved before testing can proceed. That ambiguity would have been a production defect.
Prompt — boundary and edge case analysis
Role / context
I'm a QA engineer writing test cases for a Guidewire PolicyCenter personal auto implementation. I need edge case and boundary analysis for a specific business rule.
Task
Generate edge case and boundary test scenarios for the following business rule: drivers under 25 years of age are assessed a young driver surcharge on all personal auto policies. Apply boundary value analysis and equivalence partitioning. Also identify any edge cases that arise from the interaction between driver age and other rating factors (multi-vehicle discount, winter tire discount, telematics enrolment).
Additional context
The surcharge applies to the youngest rated driver on the policy. If the youngest driver turns 25 during the policy period, the surcharge is removed at renewal, not mid-term. Learner drivers (G1/G2 in Ontario) are always surcharged regardless of age. The surcharge percentage is tiered: under 21 = 35%, 21–24 = 20%.
Format
For each scenario: the specific condition being tested, why it's a boundary or edge case, and what the expected system behaviour should be. Flag any scenario where the expected behaviour is ambiguous based on the business rules I've provided — those need BA/business confirmation before test case writing.
Knowledge Check
AI boundary analysis identifies a scenario you hadn't considered: a policyholder adds a G2 driver aged 26 mid-term. The business rule says G1/G2 drivers are always surcharged regardless of age, but your existing business rules documentation doesn't address mid-term driver additions. What is the correct next step?
Correct. AI surfaced a genuine requirements gap — and this is one of its most valuable contributions to testing. A scenario that isn't covered in the business rules documentation can't be tested until the expected behaviour is defined. Writing a test case against an assumption risks catching a defect that isn't one, or missing a defect because you assumed the wrong expected result. The right response is to surface it as a requirements gap, get it confirmed by the BA and business, then write the test case. QA is the last line of defence — finding gaps before testing is better than finding defects in production.
Option 3 is the professional response. AI identified a scenario with no documented expected behaviour — you cannot write a verifiable test case without knowing what the system is supposed to do. Assuming the surcharge applies may be right but it may not — mid-term endorsements, premium adjustments, and effective dating are all specific implementation decisions. Testing both possibilities and deferring to UAT is inefficient and wastes business user time on a question that should have been resolved in requirements. Raise it as a gap, get it confirmed, then test it properly.
4
Automation and script support — AI for those who code
For QA engineers who write automation scripts, AI is a significant productivity accelerator — not for generating complete test suites autonomously, but for the specific tasks that make scripting slow: boilerplate code, selector identification, data generation, and framework-specific syntax that requires documentation lookup.
This section is for QA professionals who do some scripting. If that's not your role, the concepts in this section still apply through manual test case design — the principles of what to test and how to structure verification are the same whether the test runs manually or through a script.
⚙️
Boilerplate and framework setup
Page object models, test data factories, before/after hooks, configuration files — AI generates these patterns for Selenium, Playwright, Cypress, or your framework of choice. You describe the structure; AI produces the code you review and adapt.
🔢
Test data generation
Generating realistic but fake test data at scale — Ontario driver profiles, VINs, postal codes in specific rating territories, policy numbers — is exactly the kind of structured generation AI handles well. Verify output before loading into test environments.
🔍
Selector and locator strategies
Describing a UI element in plain language and asking AI to suggest robust locator strategies (preferring data-testid attributes, accessible names, or stable IDs over brittle XPath) is faster than manual inspection every time.
🐛
Script debugging assistance
Paste a failing test script and the error message into AI and ask it to identify likely causes. AI is good at spotting common automation failure patterns — timing issues, stale element references, incorrect wait strategies — faster than manual debugging.
🔄
Converting manual cases to scripts
Given a well-written manual test case with specific steps and expected results, AI can produce a skeleton automation script. The steps map to actions, the expected results map to assertions. You fill in the selectors and review the logic.
⚠️
What automation AI cannot do
Generate production-ready automation without review. Select the right automation strategy for a given system. Understand why a specific Guidewire UI element is difficult to locate. Make decisions about what should and shouldn't be automated. Those remain your professional judgments.
The automation review standard
AI-generated automation scripts must be reviewed with the same rigour as any other code — perhaps more, because flawed test scripts can produce false passes that are worse than no automation at all. A script that asserts the wrong thing and passes every time gives false confidence. Review every assertion against the actual expected behaviour. Test the test before it goes into a regression suite.
Knowledge Check
You use AI to generate a Selenium script for testing the PolicyCenter auto quote premium calculation. The script runs, completes without errors, and reports a pass. You notice the assertion checks that the premium field "is not empty" rather than verifying the specific calculated value. What is the problem and what should you do?
Exactly right. This is one of the most dangerous failure modes in automation: a test that passes but tests the wrong thing. "Field is not empty" will pass regardless of whether the rating engine is calculating correctly, incorrectly, or not running at all. A regression suite full of tests like this gives false confidence — the suite goes green, the team proceeds to production, and the rating defect is discovered by a policyholder. Fix the assertion to verify the actual expected value. This is the review discipline that distinguishes professional automation from code that merely runs.
Option 4 is the correct response. A test that asserts "field is not empty" for a premium calculation is worse than no test — it actively misleads the team into thinking something is verified when it isn't. The rating engine could return $0, $999,999, or a completely wrong value and this test would still pass. Running it more times confirms consistent passing, not correct behaviour. Adding it to regression and improving later means every regression run until then is providing false assurance. Fix the assertion now, before it enters the suite.
5
Module summary
✅
Test case quality standard
Specific expected results, defined test data, observable outcomes. AI generates volume and structure — you supply the specific values from business rules and specifications. Vague expected results must be fixed before cases enter any suite.
✅
Business rules in the prompt
The more specific project context you give AI, the more executable the output. Include the confirmed business rules, threshold values, and system states — AI knows how to structure a test case, you know what the system is supposed to do.
✅
Boundary and edge case coverage
AI surfaces boundary conditions and edge cases systematically. When AI identifies a scenario with no documented expected behaviour, raise it as a requirements gap before writing the test case — not after discovering a production defect.
✅
Automation review discipline
AI-generated scripts must be reviewed like any other code. A test that asserts the wrong thing and passes is worse than no test. Verify every assertion against the actual expected behaviour before any script enters a regression suite.
Ready for Module 03
Module 03 — Defect Intelligence — covers the other half of execution: writing defect reports that get fixed, analysing defect patterns to identify systemic issues, and using AI to communicate defects clearly to development teams and business stakeholders in ways that actually move things forward.
✓
Module 02 Complete
Test Cases at Scale is done. Continue to Module 03: Defect Intelligence.