Module 03: Judge — Verification, Risk, and Responsible Use

1

Hallucination — the most important failure mode to understand

In Module 01 you learned that LLMs don't look things up — they predict text based on patterns in their training data. This has a critical implication: when an AI is asked about something it doesn't have reliable data for, it doesn't say "I don't know." It generates a plausible-sounding answer anyway.

This is called hallucination — and it's the most dangerous characteristic of current AI tools for professional use, because the confident, authoritative tone of hallucinated content looks identical to accurate content. There is no "this might be wrong" flag. No asterisk. No hesitation in the writing style.

This is not a bug being fixed

Hallucination is not a defect that will be patched in the next update. It's a fundamental characteristic of how LLMs work. Even the most capable current models hallucinate. The frequency varies, but the risk never reaches zero. Your verification habits are your only protection.

Prompt

What are the specific capital requirements for Canadian insurance companies under OSFI's Life Insurance Capital Adequacy Test (LICAT) framework, and what was the minimum ratio as of Q3 2024?

AI Response (contains hallucinated content)

Under OSFI's LICAT framework, Canadian life insurance companies are required to maintain a minimum Total Ratio of 90%, with a supervisory target ratio of 100%. As of Q3 2024, the framework requires companies to hold Available Capital against a set of prescribed risk components including insurance risk, market risk, and operational risk. The specific ratios for Q3 2024 were confirmed at these levels in OSFI's quarterly guidance circular 2024-LC-07, which noted no changes to the minimum threshold from the prior quarter.

⚠️ The underlined content is fabricated. The general framework description may be directionally correct, but the specific ratio numbers, the guidance circular reference "2024-LC-07", and the claim it "confirmed" anything are invented. This looks authoritative. Any professional who used these figures in a regulatory submission or client report without verifying from the actual OSFI source would face serious consequences.

Notice what makes this dangerous: the response isn't entirely wrong. The general structure of LICAT is real. The risk components listed are real. The invented content is woven into real content so smoothly that someone without deep domain knowledge wouldn't spot it. This is precisely why hallucination is more dangerous than obvious nonsense — it requires you to know enough to verify.

Where hallucination is most likely to occur:

📅

Recent or current information

AI training has a cut-off date. Anything that happened after that date — regulatory updates, market data, recent events — may be fabricated if asked about directly.

🔢

Specific numbers and statistics

Percentages, thresholds, dates, version numbers. These are highly susceptible to fabrication because the model needs a specific value and will generate one if it doesn't have one.

📄

Citations and references

Specific document names, article titles, report numbers. AI frequently invents plausible-sounding citations that don't exist. Never use an AI-provided reference without verifying it exists.

⚖️

Regulatory and legal specifics

The names of regulations are often real. The specific clauses, thresholds, and requirements attributed to them may not be. The general direction can be right while the details are wrong.

Knowledge Check

You ask an AI to list the key requirements of PIPEDA (Canada's Personal Information Protection and Electronic Documents Act) for your company's privacy assessment. The AI produces a detailed, well-structured list. What is the most professionally responsible next step?

Correct. AI can provide a useful starting framework for regulatory content — but even well-known legislation gets hallucinated details. Specific clause numbers, thresholds, and requirements must be verified against the actual source. The OPC website and the actual text of PIPEDA are the authoritative sources. A footnote saying "AI-generated" doesn't transfer liability — you're still the professional who submitted the assessment. Use AI for structure; verify the substance.

The right approach is to verify against the actual legislation before including anything in a formal assessment. Even well-known regulations get hallucinated details — specific clause numbers, thresholds, and requirements. AI cannot double-check its own accuracy because it has no way to compare its output against a ground truth source. And a footnote doesn't transfer professional liability — you submitted it. Use the AI list as a starting framework; the authoritative source is the actual legislation and OPC guidance.

2

What verification actually looks like in practice

"Always verify" is easy advice to give and hard to follow systematically. The practical question is: verify what, how much, and using what sources? A proportionate approach based on the stakes of the content is more realistic than trying to source-check every sentence.

Content type

Stakes if wrong

Verification approach

Specific regulations, thresholds, legal requirements

Very high — regulatory, legal, financial consequences

Always verify from primary source (actual legislation, regulator website). Do not use AI as a reference.

Client-facing reports, status updates, formal documents

High — your professional reputation, client trust

Review all facts and claims against your actual project data before sending. You own every word.

Technical specifications, architecture decisions

Medium-high — incorrect specs create rework or production issues

Verify technical details against official documentation. Treat AI-generated code as requiring review, not as production-ready.

Internal analysis, option generation, brainstorming

Lower — internal documents with human review before any action

Spot-check key facts. Directional accuracy is often sufficient. Flag anything that will inform a material decision.

First-draft text, summaries, communication drafts

Low — you're editing it anyway before it goes anywhere

Read critically for tone and accuracy. Replace any specific claims you haven't verified. Inherently a review task.

A practical verification habit: When AI produces content that includes specific facts — numbers, dates, names, percentages, citations — pause and ask yourself: "Do I know this to be true independently, or am I relying on the AI for this?" If you're relying on the AI, that item needs a check before it goes anywhere formal.

This isn't about not trusting AI. It's about understanding what kind of tool you're working with. A capable drafter who occasionally invents facts is exactly what an LLM is. You wouldn't send a colleague's draft to a client without reading it — you shouldn't send an AI draft either.

Verification in practice — BA context

Situation: A BA uses AI to draft a section of a business case covering the regulatory context for a claims processing modernisation project. The AI produces a detailed paragraph referencing specific FSRA (Financial Services Regulatory Authority of Ontario) requirements for claims turnaround times.

The right response: Before this paragraph goes into the business case, the BA checks the actual FSRA guidance documents. Some of what the AI wrote is directionally correct. One specific turnaround time figure is wrong — the regulation applies to a different product line. The BA corrects it.

What saved them: They treated the AI output as a first draft, not a finished product. The business case went to the executive sponsor and the steering committee. An incorrect regulatory reference in that document would have been professionally damaging and potentially harmful to the project approval.

3

Privacy, confidentiality, and data risk

This is the area where IT professionals in regulated industries — insurance, financial services, healthcare — face the most significant professional and legal exposure from AI use. The risk is not hypothetical.

How the risk works: When you paste content into an AI tool, that content may be used as training data for future model versions, stored on servers in other jurisdictions, or accessible to the tool provider's staff for quality review. This is governed by the tool's terms of service, which most people don't read.

The practical implication: what you paste into a consumer AI tool should be treated as non-confidential, because you cannot guarantee it stays confidential. That means any content that would require a non-disclosure agreement, that includes client data, employee information, financial data, or proprietary business information needs careful handling.

Data sent to consumer AI tools is outside your control — enterprise tools with data processing agreements are the safer path for sensitive content

A practical data sensitivity framework — think about what you're about to paste into an AI tool in these terms:

Never paste

Client PII, financial data, NDA-protected content, employee records

Names, policy numbers, claim amounts, medical information, salary data, proprietary business processes, client contracts. This content has legal protection requirements that consumer AI tools don't satisfy. If you need AI help with this type of content, use your organisation's enterprise-licensed tool with a confirmed data processing agreement — or anonymise completely before pasting.

Use caution

Internal business strategy, unannounced projects, client names without permission

Details about your client's technology roadmap, your employer's strategic plans, project names before public announcement, or client names attached to project details. This may not be legally protected but could cause professional or commercial harm if it surfaced. Consider whether you need to include these specifics, or whether you can describe the situation more generically.

Generally fine

Anonymised scenarios, public information, general professional work

Meeting notes with names removed, draft documents without client identifiers, code that doesn't contain credentials or proprietary logic, questions about technology or regulations using publicly available information. When in doubt, anonymise — it almost never reduces the quality of the AI output significantly.

Knowledge Check

A developer needs help debugging a piece of code that processes insurance claims. The code contains real policy numbers and claimant names from a test environment that was seeded with production data. They want to paste the code into ChatGPT for help. What should they do?

Correct. Anonymisation before pasting is the right approach — and it almost never reduces the quality of help you get. The AI needs to understand the code structure, not the actual data values. Test environments seeded with production data are still subject to the same data protection obligations as production data. "It's from test" is not a safe harbour. Simply substituting placeholder values takes 60 seconds and removes the risk entirely.

Test environments seeded with production data carry the same data protection obligations as production data — the real policy numbers and names are personal information regardless of which environment they're in. You cannot ask an AI tool to "keep things confidential" — it doesn't work that way. The right answer is to anonymise before pasting: replace real values with clear placeholders. The AI doesn't need real data to help with a debugging problem; it needs to understand the code structure.

4

Bias, overconfidence, and knowing when not to use the tool

Two more failure modes matter for professional use, and they're subtler than hallucination because they don't announce themselves at all.

Failure mode: Bias

LLMs learn from human-generated text, which means they absorb human biases. These can surface in ways that matter professionally: assumptions about gender in job descriptions, demographic assumptions in scenario descriptions, cultural biases in "professional communication" standards, or systematic underrepresentation of certain perspectives. You won't always notice it — which is why reading critically, especially for anything involving people, is essential.

Bias in practice: Ask an AI to write a job description for a senior architect role and it may default to language and phrasing patterns associated with male candidates. Ask it to write a user persona for an insurance customer and it may make assumptions about age or economic situation. These aren't catastrophic failures — but they're the kind of thing that creates a diversity and inclusion problem if they make it into published materials without review.

The mitigation is the same as for hallucination: read critically before anything goes anywhere. Specifically ask yourself when reviewing AI output: has this made assumptions about people that I didn't intend or that could cause harm?

Failure mode: Overconfident tone

AI tools write with consistent confidence regardless of how certain they actually are. A sentence about something the model knows well and a sentence it's partially guessing on read identically. There's no "I'm less sure about this" hedge. This is why domain expertise matters — you need to know enough about the subject to notice when the tone doesn't match the reality of how certain something can be.

When not to use AI at all — this is as important as knowing when to use it:

A simple decision framework — not every task benefits from AI, and the ones with the highest stakes need the most human oversight

Specific situations where AI is not the right primary tool:

Performance conversations with employees, sensitive client relationship discussions, ethical decisions that require contextual judgment you hold, situations where the institutional knowledge in your head is the critical input, and anything where "an AI suggested it" would not be an acceptable explanation if challenged.

The goal is not to use AI everywhere. The goal is to use it well where it adds value — and to have enough judgment to know the difference.

Knowledge Check

You're a project manager who needs to give a team member feedback about repeatedly missing deadlines. You're considering using AI to draft the feedback for you. Which approach reflects the best professional judgment?

Well reasoned. This is the right balance. AI can help you think through the structure and framing of a difficult conversation without you needing to expose the individual's details. The actual feedback — the specific observations, the tone calibrated to your relationship, the context only you have — needs to come from you. Using AI to think through the approach, then applying your own judgment and knowledge to the actual conversation, is exactly how augmentation is supposed to work for high-stakes human interactions.

The right approach is to use AI generically for framing and structure, then prepare the actual feedback yourself. Pasting the person's name and details raises privacy concerns. Using AI output directly in a performance conversation removes the human judgment that makes feedback actually useful — AI doesn't know this person, your relationship, or the context. But avoiding AI entirely misses a genuine use case: thinking through how to structure a difficult conversation. The key distinction is using AI to prepare your thinking versus using it to generate what you actually say.

5

Module summary — what you've learned

✅

Hallucination is structural

LLMs generate plausible text whether or not it's accurate. Specific facts, numbers, citations, and regulatory details must always be verified from primary sources before professional use.

✅

Proportionate verification

Verify in proportion to the stakes. Client-facing and regulatory content needs source-level verification. Internal first drafts need critical reading. Every piece of AI output needs a human in the loop before it goes anywhere formal.

✅

Privacy is a real risk

Consumer AI tools should never receive client PII, NDA-protected content, or sensitive organisational data. Anonymise before pasting, or use an enterprise tool with a data processing agreement.

✅

Judgment is your job

Bias, overconfident tone, and lack of context mean that AI output always requires critical human review. The value of AI comes from what you do with its output — not from the output itself.

Ready for Module 04

Module 04 — Advance — brings the four modules together and helps you choose the role-specific pathway that fits your specialisation. You've built the foundation. Now you direct it.

✓

Module 03 Complete

You've finished Judge — the professional judgment layer. Your progress has been saved. When you're ready, continue to Module 04: Advance.

Back to Pathway Continue to Module 04 →