Honest Failure Guard

Gives AI agents a structured stop-and-declare protocol for four failure states — low confidence, collapsed premise, exhausted search, and incomplete evidence — so agents label what they don't know instead of fabricating what they can't verify.

Rating

No Ratings

Sold

How to use

Download

Honest Failure Guard

The Problem

Your AI agent doesn't know the answer.

So it gives you one anyway.

It doesn't flag that it's uncertain. It doesn't tell you the sources ran dry. It doesn't mention that the premise you gave it was broken. It generates fluent, confident-sounding output from a position of zero verified knowledge — and you have no way to know the difference from the outside.

This is not a rare edge case. It is the default behavior of AI agents under pressure to produce output.

How It Works

This skill installs a structured stop-and-declare protocol for four specific failure states. When an agent hits one of these states, it stops generating output-as-if-certain, labels the failure explicitly, and waits for human direction.

The agent does not self-recover by lowering its standards. It does not proceed with fabricated confidence. It does not treat stopping as failure — stopping honestly is the correct behavior.

The Four Failure States

State 1 — Low Confidence

Trigger: The agent's internal assessment of a claim's reliability is low — sources are absent, training data is the only basis, or the information is time-sensitive without verification.

What the agent does:

Assigns a confidence score (1–5 scale)
At score 4+: issues an abstention declaration
Labels the claim with [SPECULATION] or [UNVERIFIABLE]
Does not present the claim as fact

Output format:

[Confidence: 4/5 — abstaining]
Reason: Primary sources unavailable. Training data basis only.
What I can offer: [description of what is verifiable]
To proceed: [what the human would need to provide or confirm]

A score of 4 or 5 does not mean the agent refuses to help. It means the agent is honest about the epistemic state of its output and hands the decision back to the human.

State 2 — Premise Collapse

Trigger: A core assumption that the task depends on turns out to be false, ambiguous, or contradicted by new information.

What the agent does:

Stops work immediately
Reports which premise collapsed and why
Maps the downstream impact (what parts of the task are now invalid)
Does not continue building on a broken foundation

Output format:

[Premise collapse detected]
Collapsed premise: [what was assumed]
Evidence of collapse: [what contradicts it]
Impact: [which parts of the task are affected]
To proceed: [what needs to be redefined]

Continuing to work after a premise collapse — hoping the output will still be useful — produces compounding errors that are expensive to untangle.

State 3 — Search Exhaustion

Trigger: Multiple search attempts across distinct query families have returned zero valid, verifiable sources for a critical claim.

What the agent does:

Reports the exhaustion state
Lists the queries that were attempted
Applies [UNVERIFIABLE] to the affected claim
Does not fabricate a source to fill the gap

Output format:

[Search exhausted]
Claim: [what could not be verified]
Queries attempted: [list]
Result: [UNVERIFIABLE] — no confirming source found
To proceed: [alternative approach or human input needed]

A [UNVERIFIABLE] label is not a failure of the agent — it is an honest accounting of what the available tools could and could not confirm.

State 4 — Incomplete Evidence

Trigger: Sources exist but do not meet the quality threshold required for the claim — too old, too indirect, or insufficient coverage of critical categories.

What the agent does:

Reports which evidence categories are missing
Flags affected claims with [STALE_DATA⚠️] or confidence reduction
Does not proceed to high-confidence conclusions from low-quality evidence

Output format:

[Evidence incomplete]
Missing: [category — e.g., "primary source for X", "post-2023 data on Y"]
Available: [what was found and its limitations]
Confidence reduction applied: [affected claims]
To proceed: [what additional evidence would resolve this]

Human Override

Every failure state ends with a clear statement of what the human would need to provide or confirm to allow the agent to continue.

The human can:

Provide the missing information directly
Accept the labeled uncertainty and instruct the agent to proceed with caveats
Redefine the premise or scope
Confirm that a lower evidence standard is acceptable for this task

The agent does not self-override. "I'll try my best anyway" is not an honest failure response — it is the behavior this skill is designed to replace.

Before / After

Before:

"The company was founded in 2018 and currently has approximately 3,400 employees across 12 offices."
[None of this was verified. The agent produced plausible-sounding numbers from training data.]

After (with Honest Failure Guard):

[Confidence: 4/5 — abstaining]
Founding year: unverified. Employee count: [UNVERIFIABLE] — search returned no current primary source.
What I can confirm: [company exists, general industry context]
To proceed: provide a current source or confirm that approximate figures are acceptable.

Hard Rules

Fluent output is not evidence of accuracy. Confidence in tone does not correspond to confidence in facts. The agent's output quality check is epistemic, not stylistic.
Abstention is a valid and correct response. Stopping honestly when evidence is insufficient is not a failure — it is the behavior that prevents downstream harm from fabricated output.
Premise collapse stops work immediately. Building on a broken premise produces compounding errors. The agent stops, maps the damage, and waits.
[UNVERIFIABLE] is mandatory when search is exhausted. The agent does not invent a source, cite a likely-existing URL without accessing it, or lower the definition of "found" to mean "probably exists."
Every failure state ends with a clear path forward. The agent does not just stop — it tells the human what would be needed to continue. A dead end without a map is not a useful output.
Human override is required to proceed past a declared failure. The agent does not self-recover by redefining the failure condition. Only explicit human direction unlocks continuation.

More from "@geko"

Cascade-Safe Edit Guard

Catches cascade failures that partial checks miss — pre-check names the risk before editing, post-check demands evidence values after, then auto-repairs in a loop while rereading the entire file every cycle until zero errors remain.

US$20

Anti-Hallucination Search Protocol

Forces AI agents to verify every factual claim through live search before output — attaching [SPECULATION] and [UNVERIFIABLE] labels to anything unconfirmed, so fabricated URLs, outdated numbers, and invented citations become visible instead of silent.

US$20

Task Completion Gate

Forces AI agents to prove a task is actually done before declaring it complete. 4-phase gate: instruction reconciliation, evidence anchoring (values only readable from the actual output), self-scoring (A/B/C/D), and reverse witness check. Eliminates "done" declarations that aren't.

US$20

Similar Agents

Amazon Listing Fixer

Find Amazon listing gaps and get safer copy, image, and A+ fixes.

US$9.99 / week

SalesPathMonitor｜CV導線の死活監視（毎日）

Playwright headlessで毎朝、販売・コンバージョン・アフィリリンクの導線を巡回チェック。CTA要素の存在と遷移先ドメイン、HTTP statusを検証します。公開ページのみ対象（ログイン・トークン不要）。LLM呼び出しゼロで月額API課金は増えません。既知NGと新規NGを分離してSlackアラートのノイズを抑制。EC・LP・アフィリ運営者向け、CV導線が壊れたら24時間以内に検知します。

US$39

Agentic FDE — 自进化 Agent 系统方法论

让AI真正落地到你的企业中的每个环节，真正的agentic FDE

US$7,000