Dev Module 03: Debugging and Root Cause

1

AI for log analysis — from wall of text to signal

Guidewire server logs are verbose. A single failed transaction can produce hundreds of lines of stack trace, warning messages, and informational output — with the actual root cause buried somewhere in the middle. Reading through them manually is time-consuming and cognitively taxing, especially when you're under pressure to restore a broken environment or meet an SIT deadline.

AI excels at this specific task: pattern recognition in large text volumes. Paste a log block and ask it to identify errors, their likely causes, and which entries are most relevant to your specific problem. What used to take 20 minutes of careful reading takes 3 minutes of structured prompting and review.

2026-03-15 14:23:41.382 [INFO] PolicyCenter - Starting bind transaction for policy QA-ON-2026-00441
2026-03-15 14:23:41.891 [INFO] RatingEngine - Initiating rating for period 2026-03-15 to 2027-03-15
2026-03-15 14:23:42.103 [WARN] TerritoryResolver - Territory code lookup returned null for postalCode=K1A0A6, using default
2026-03-15 14:23:42.441 [INFO] RatingEngine - Base premium calculated: $1,247.00
2026-03-15 14:23:42.889 [ERROR] SurchargeCalculator - NullPointerException: Cannot invoke method getSurchargeRate() on null
2026-03-15 14:23:42.890 [ERROR] SurchargeCalculator - at SurchargeTable.lookupBySurchargeCode(SurchargeTable.gsx:47)
2026-03-15 14:23:42.891 [ERROR] SurchargeCalculator - at PolicyRatingPlugin.calculateSurcharges(PolicyRatingPlugin.gsx:203)
2026-03-15 14:23:42.892 [ERROR] RatingEngine - Rating failed for policy QA-ON-2026-00441, rolling back
2026-03-15 14:23:42.901 [WARN] TerritoryResolver - 14 additional postal codes returned null territory lookup in last 60 seconds

Prompt — Guidewire log analysis

Context I'm debugging a Guidewire PolicyCenter issue. A bind transaction is failing in SIT. Here are the relevant server log entries from the failure window.

Task Analyse these log entries. Identify: 1) the immediate cause of the failure, 2) any warning entries that may be related or indicate a contributing issue, 3) the most likely root cause based on the log sequence, and 4) what I should investigate first to confirm the root cause.

Log entries [paste log entries here] System context: PolicyCenter 10.x, Ontario personal auto policy, SIT environment. The bind failure started approximately 2 hours ago and affects all policies with postal codes in the K1A range.

Format Lead with the most likely root cause. List contributing factors. Suggest the first two investigation steps to confirm. If you identify something that requires Guidewire-specific knowledge to confirm, flag it explicitly — I'll verify against the implementation.

What AI analysis of the log above would identify

Immediate cause: NullPointerException in SurchargeCalculator.lookupBySurchargeCode() at line 47 — the surcharge lookup is returning null, causing a failure when getSurchargeRate() is called on the null object.

Contributing factor: TerritoryResolver warning — postal code K1A0A6 returned null territory lookup and used a default. This occurred 14 times in the last 60 seconds, suggesting a systematic territory code data issue that started around the same time as the bind failures.

Most likely root cause: The null territory code from TerritoryResolver may be flowing through to SurchargeCalculator, where the surcharge table lookup relies on a non-null territory code to return a valid surcharge record. A null territory → null surcharge lookup → NullPointerException.

First investigation steps: 1) Check SurchargeTable.gsx line 47 — is it null-checking the input before calling getSurchargeRate()? 2) Check whether the territory code data for K1A postal codes was recently changed or deleted in the reference data tables.

What AI can't confirm without you: Whether the territory→surcharge data relationship actually exists in this implementation's data model, and whether there was a recent reference data change in SIT that would explain the sudden onset.

Knowledge Check

AI analysis of the log identifies the NullPointerException in SurchargeCalculator as the immediate cause and suggests the territory code data issue as the most likely root cause. You look at SurchargeTable.gsx line 47 and see there is no null check. You then check the reference data and find that the Ontario territory code table was updated by another developer yesterday. What do you do?

Correct — and this is the root cause vs. symptom distinction that Module 03 is about. The NullPointerException is a symptom. The missing null check in the code is a contributing fragility. The root cause is the territory code data update that broke the K1A mapping. A null check without fixing the data means bind transactions for K1A policies "succeed" — but with a default surcharge that's likely wrong, creating a financial error that's harder to detect than the visible failure. Address the data first, add the null check as a defensive measure, and document both the data fix and the code improvement. AI got you to the right place fast — your understanding of the system is what told you what to do about it.

Option 2 describes the complete response. The null check alone fixes the error but creates a worse problem — silent incorrect surcharge calculations. Reverting the territory data update may not be correct either — the update might be intentional and the null check gap is the real code problem. Logging a defect and waiting when you've already identified the root cause and both parties involved is unnecessarily slow in an SIT environment where testing is blocked. The correct approach is: address the data issue (coordinate with the developer who made the change), add the null check as a defensive measure, and communicate what happened and why to both the team and the QA team who should verify the fix.

2

Integration failure diagnosis — tracing across systems

Integration failures in insurance IT are particularly hard to debug because the error often appears in a different system from where it originated. A ClaimCenter error may originate in a PolicyCenter data state. A billing failure may trace back to an integration message that was malformed three hops earlier. AI helps you reason through multi-system failure chains faster — especially when you provide logs from multiple systems and ask it to correlate them by timestamp and transaction ID.

Prompt — multi-system integration failure analysis

Context I'm debugging a Guidewire integration failure. PolicyCenter is sending policy change notifications to an external broker portal via a REST integration. Some notifications are being accepted, others are returning a 400 error from the portal. I have logs from both systems.

Task Analyse the logs from both systems. Correlate them by timestamp and transaction ID where possible. Identify: what distinguishes the failing requests from the successful ones, what the 400 error response suggests about the data problem, and where in the PolicyCenter integration code the issue most likely originates.

PolicyCenter logs (sending side) [paste PolicyCenter integration logs] Portal response logs (receiving side): [paste portal error response logs] Additional context: The integration was working correctly until a product model change was deployed to SIT last Tuesday. Failures affect approximately 30% of notifications — specifically multi-vehicle policies with more than 2 vehicles.

Format 1) Pattern summary — what the failing requests have in common. 2) Hypothesis about root cause based on the log evidence. 3) Specific fields or code paths to investigate first. 4) Any data from the logs I should extract and share with the portal team to help them confirm the issue on their side.

The context sentence that changes everything

"The integration was working correctly until a product model change was deployed to SIT last Tuesday. Failures affect approximately 30% of notifications — specifically multi-vehicle policies with more than 2 vehicles." That single context addition narrows the investigation space from "anything in either system" to "the product model change and its effect on multi-vehicle policy data." AI will incorporate this immediately. Without it, AI analyses the logs in isolation — still useful, but slower to the relevant hypothesis. Always include what changed recently and what the affected subset has in common.

Knowledge Check

AI analysis of the integration logs hypothesises that the product model change added a new required field to the vehicle data structure that the integration message handler isn't populating — causing the portal to reject messages with 3+ vehicles (which use the new field) while accepting 1-2 vehicle policies (which don't). AI says "this is the most likely cause based on the logs." What is your next step?

Correct. AI's hypothesis is well-reasoned and probably right — but "probably right" is not the standard for making code changes to a production integration. Verifying takes 10-15 minutes: check the product model change notes, compare payloads, confirm the missing field. That verification step also gives you the exact field name, its data type, and where it comes from in the product model — all of which you need to write the correct fix. Fixing before verifying risks introducing a different bug if the hypothesis turns out to be partially correct. Verify, then fix with complete information.

Option 3 is the correct next step. Fixing immediately on a hypothesis, even a well-supported one, risks implementing a fix that's incomplete or slightly wrong. Sending AI's analysis to the portal team before you've verified it may introduce confusion if the hypothesis turns out to be incorrect. Asking AI to generate the fix before verifying the root cause compounds the uncertainty. The right sequence is always: hypothesis → verify with evidence → fix with complete information. AI got you to a good hypothesis fast; your job is to confirm it before acting on it.

3

Root cause vs. symptom — the distinction that matters

AI debugging assistance is particularly good at identifying the immediate symptom — the error, the exception, the failing assertion. It's less reliable at distinguishing between the symptom and the underlying root cause, especially when the root cause is a data state, a configuration issue, or a design decision made elsewhere in the system.

The root cause discipline matters because fixing symptoms creates technical debt and recurrence. Fixing root causes creates stable systems. In insurance IT, where the same defect class can affect thousands of policies or transactions, a symptom fix that masks a data problem is significantly more dangerous than an unresolved error.

🩹

Symptom fix (fast, dangerous)

Add a null check to prevent the NullPointerException. The error goes away. The underlying data problem remains — and now it's silent. Policies with the bad territory code get processed with incorrect surcharge calculations. Nobody knows until a policyholder notices their premium is wrong, or an audit finds the discrepancy.

🔍

Root cause fix (slower, correct)

Identify the territory code data update that broke the K1A mapping. Fix the data. Add the null check as a defensive measure to prevent future failures from other data issues. Document what happened. Verify the fix against affected policies. This takes longer but produces a system that's actually working correctly.

Prompt — root cause vs. symptom analysis

Task I've identified a bug: [describe the symptom]. I have a fix that resolves the immediate error: [describe the fix]. Before I apply it, help me think through whether this fix addresses the root cause or only the symptom. What questions should I be asking to confirm I'm at the root cause?

Context [describe what you know about the system state, recent changes, and how the error manifests]

Format Assessment of whether the proposed fix is at the root cause or treating a symptom. Key questions to answer before applying the fix. If the fix is treating a symptom, what the root cause investigation should focus on.

4

The debugging conversation — iterative AI assistance

Effective AI-assisted debugging is iterative, not single-prompt. You share context, AI generates a hypothesis, you test the hypothesis, you share what you found, AI refines its analysis. This conversation pattern — like pair programming with a fast but imperfect partner — often reaches the root cause faster than solo debugging, even accounting for AI's limitations.

The key discipline is providing feedback on each hypothesis: "That was right — I confirmed the territory code issue. Now the question is why the table update broke only K1A codes and not others." Each feedback cycle narrows the investigation space.

What an effective debugging conversation looks like

Turn 1 (you): Log analysis prompt with full context. AI identifies NullPointerException in SurchargeCalculator and territory code warning as likely related.

Turn 2 (you): "I confirmed the territory code warning — K1A codes are returning null from TerritoryResolver. The territory table was updated yesterday. But I also see the same postal code returning null in a different context — a mid-term endorsement on a 2021 policy that was working fine yesterday. What could cause a postal code that was working to start returning null?"

AI refines: Now with the additional context that previously-working codes are affected, AI can narrow to: table key change, lookup logic change, or caching issue. Each hypothesis has different investigation steps.

Turn 3 (you): "There's a caching layer on TerritoryResolver. When was it last cleared?"

This conversation reaches "stale territory code cache from before yesterday's table update" in three turns — probably 8-10 minutes. Solo debugging of a caching issue buried in a stack trace can take much longer.

When AI debugging assistance goes wrong

AI occasionally generates confident-sounding hypotheses that are simply wrong — not partially right, but pointing in the wrong direction entirely. The indicator is when an AI hypothesis requires a series of increasingly complex explanations for why your observations don't quite match. If you're three steps into investigating an AI hypothesis and the evidence keeps pointing somewhere else, the hypothesis is probably wrong. Back up, share your observations fresh, and ask AI to reconsider. Sunk cost in an incorrect debugging path is a real time sink — one that solo developers without AI are equally susceptible to, but one that AI-assisted debugging doesn't automatically fix.

Knowledge Check

You've been debugging a failing Guidewire integration for 90 minutes following an AI hypothesis about a message format issue. Each time you investigate, the evidence partially fits the hypothesis but doesn't quite explain the full pattern — some messages fail, some don't, in a way the format theory doesn't account for. What should you do?

Correct. The "evidence partially fits" pattern after 90 minutes is strong signal that the hypothesis is wrong — not incomplete. Continuing down the same path produces diminishing returns. AI asked to explain why its own hypothesis doesn't fit will often generate increasingly elaborate explanations to defend the hypothesis rather than revising it. Escalation is a reasonable step, but before escalating, restarting the AI conversation with fresh observations costs 5 minutes and often produces a better hypothesis. The discipline is: notice when you're forcing observations to fit a theory rather than letting observations generate the theory, and reset.

Option 4 is the right call. "90 minutes isn't long enough to rule it out" is sunk cost reasoning — the evidence not fitting is the signal, not the time elapsed. Asking AI to explain why the evidence doesn't fit usually produces hypothesis defence, not hypothesis revision. Escalating immediately without trying a fresh perspective misses an easy first step. The most productive action is a fresh AI conversation with your observations stated specifically — what fails, what succeeds, and what distinguishes them. That fresh perspective costs almost nothing and often cuts through where prolonged investigation of a wrong hypothesis can't.

5

Module summary

✅

Log analysis at speed

Paste logs with system context. AI identifies errors, correlates warnings, and surfaces the most likely root cause hypothesis. What used to take 20 minutes of careful reading takes 3 minutes of prompting and review. Your job: verify the hypothesis against the actual system.

✅

Multi-system correlation

Integration failures appear in different systems than their origin. Include logs from both sides, specify what changed recently, describe the failure pattern. Context about what changed and what subset is affected narrows the investigation space dramatically.

✅

Root cause vs. symptom

Symptom fixes mask problems. Use AI to challenge your proposed fix: is this treating the symptom or the cause? In insurance systems where silent incorrect calculations affect thousands of transactions, the distinction matters more than in most environments.

✅

Iterative conversation; reset when stuck

Debugging is a dialogue. Feed back what you found, ask AI to refine. When evidence partially fits but doesn't fully explain after 60–90 minutes, reset with fresh observations — not deeper investigation of a probably-wrong hypothesis.

Ready for Module 04

Module 04 — Technical Communication — addresses the other half of a developer's job: explaining what you built, why it broke, and what the plan is to people who don't think in code. AI helps close the gap between developer thinking and stakeholder understanding — faster documentation, clearer incident communication, better design explanations.

✓

Module 03 Complete

Debugging and Root Cause is done. Continue to Module 04: Technical Communication.

Back to Pathway Continue to Module 04 →