Additional

Agentic Guardrails Improvements

Synced from github.com/CoWork-OS/CoWork-OS/docs

Date: 2026-03-06 Branch: main Commits: 82709df3 and prior Goal: Reduce task failures caused by overly conservative guardrails and loop detection. Competing coding-agent products give the LLM more room to iterate, retry, and self-correct; these changes bring our agent closer to that baseline.


Root Causes Identified

#Root CauseFileImpact
1Tool disabled after just 2 consecutive failuresexecutor-helpers.tsHigh
2Exact-same tool call deduplicated after only 2 occurrencesexecutor.tsHigh
3Completion validator rejects short "done" responses even after tool successcompletion-checks.tsHigh
4Progress score counts only file writes; reading/searching code scores as zeroprogress-score-engine.tsHigh
5Per-step iteration cap too low (16); complex tasks regularly hit itexecutor.tsHigh
6Global guardrail limits too conservative (50 iterations, 320 turns)guardrail-manager.tsMedium
7Loop detection thresholds too aggressive; legitimate fix-cycles (read→edit→test) flaggedcompletion-checks.ts, executor.ts, executor-loop-utils.tsMedium
830-minute step timeout insufficient for deep workexecutor-helpers.tsMedium
9Context compaction removes error-recovery context for active filescontext-manager.tsMedium
10When a tool is disabled, agent has no fallback suggestion — just stopsexecutor.ts, executor-helpers.tsMedium

Phase 1 — Relax Circuit Breakers

1a. Tool Failure Tracker (src/electron/agent/executor-helpers.ts)

Problem: A tool was permanently disabled after just 2 consecutive failures. A shell command failing twice (e.g., due to a typo being fixed) disabled run_command for the rest of the task.

ConstantBeforeAfterReason
MAX_TOOL_FAILURES252 was too aggressive; 5 allows normal iteration
cooldownMs5 * 60 * 1000 (5 min)2 * 60 * 1000 (2 min)Faster re-enablement after cooling down
maxInputDependentFailures48Input-dependent errors (ENOENT, syntax) indicate the LLM is iterating, not stuck

Tool-specific getMaxInputDependentFailures thresholds:

ToolBeforeAfter
run_applescript812
browser_*610
run_command(not set)10 (new entry)

New method added: getDisabledToolNames(): string[] — returns names of currently disabled tools, used by graceful degradation (Phase 6).

New run_command guidance in getAlternativeApproachGuidance:

  • "command not found" → suggest installing the package or using full path
  • "permission denied" → suggest checking permissions or using a different approach
  • "non-zero exit" → clarify the command itself failed (normal during development), not the tool

1b. Tool Call Deduplicator (src/electron/agent/executor.ts)

Problem: The same exact tool call (e.g., re-reading a file after editing it) was blocked after just 2 occurrences within 60 seconds.

ParameterBeforeAfterReason
maxDuplicates23Allow one more retry of the same call
windowMs60_000 (60 s)120_000 (120 s)Wider window reduces false resets
maxSemanticSimilar24Legitimate search refinements were blocked at 2

1c. Completion Validation (src/electron/agent/completion-checks.ts)

Problem: The validator required minimum response lengths (40–120 chars) even when tool execution had already confirmed task completion. Short, correct responses like "Done." were rejected.

Changes to evaluateDomainCompletion:

  • When hadAnyToolSuccess is true: accept any non-empty response without length checks. Tool evidence is sufficient proof of completion.
  • Only reject truly empty responses (no text at all) in tool-backed scenarios.

Reduced minimum lengths for non-tool-backed responses:

DomainBeforeAfter
research80 chars60 chars
writing120 chars80 chars
general / auto40 chars20 chars

Phase 2 — Better Progress Scoring (src/electron/agent/progress-score-engine.ts)

Problem: Only file writes counted as forward progress. An agent exploring a codebase, reading files, and running searches received a progress score of zero — triggering the no-progress circuit breaker prematurely.

New signals added to ProgressScoreAssessment interface:

readOperations: number;
searchOperations: number;
toolSuccesses: number;

Updated score formula:

// Before
rawScore = stepCompleted * 1.0
         + writeMutations * 0.6
         + resolvedErrorRecoveries * 0.4
         - repeatedErrorPenalty          // (count - 1) * 0.8
         - emptyNoOpTurns * 1.0

// After
rawScore = stepCompleted * 1.0
         + writeMutations * 0.6
         + readOperations * 0.2          // NEW: read_file, list_directory, search_files, glob, find_in_file
         + searchOperations * 0.3        // NEW: web_search, web_fetch, search*
         + min(toolSuccesses, 5) * 0.1   // NEW: general tool successes, capped at 5
         + resolvedErrorRecoveries * 0.4
         - repeatedErrorPenalty          // (count - 1) * 0.4 — reduced from 0.8
         - emptyNoOpTurns * 0.3          // reduced from 1.0

Penalty reductions:

  • repeatedErrorPenalty multiplier: 0.80.4 (fixing the same error is iterating, not looping)
  • emptyNoOpTurns penalty: 1.00.3 (only penalize truly empty turns, not thinking/planning)
  • Empty turn threshold tightened: only turns with trimmed.length < 5 count as no-ops (was any whitespace-only message)

Phase 3 — Increase Execution Limits

3a. Per-Step Iteration Limits (src/electron/agent/executor.ts)

ConstantBeforeAfter
maxIterations (step)1632
maxIterations (follow-up)2032
maxMaxTokensRecoveries (step)36
maxMaxTokensRecoveries (follow-up)36
requestedMaxTurns default100150

3b. Global Guardrail Limits (src/electron/guardrails/guardrail-manager.ts)

SettingBeforeAfter
maxIterationsPerTask50100
defaultMaxAutoContinuations35
defaultMinProgressScore0.250.15
defaultLifetimeTurnCap320500
loopWarningThreshold812
loopCriticalThreshold1420
globalNoProgressCircuitBreaker2030

3c. Step Timeout (src/electron/agent/executor-helpers.ts)

ConstantBeforeAfter
DEEP_WORK_STEP_TIMEOUT_MS30 * 60 * 1000 (30 min)45 * 60 * 1000 (45 min)

Phase 4 — Improve Loop Detection

4a. Loop Guardrail Configs (src/electron/agent/completion-checks.ts)

DEFAULT_LOOP_GUARDRAIL:

FieldBeforeAfter
stopReasonToolUseStreak68
stopReasonMaxTokenStreak23
lowProgressWindowSize812
lowProgressSameTargetMinCalls68
followUpLockMinStreak1012
followUpLockMinToolCalls1012
skippedToolOnlyTurnThreshold23

CODE_LOOP_GUARDRAIL:

FieldBeforeAfter
stopReasonToolUseStreak712
stopReasonMaxTokenStreak34
lowProgressWindowSize1016
lowProgressSameTargetMinCalls712
followUpLockMinStreak610
followUpLockMinToolCalls610
skippedToolOnlyTurnThreshold35

NON_CODE_LOOP_GUARDRAIL:

FieldBeforeAfter
stopReasonToolUseStreak45
lowProgressSameTargetMinCalls46
followUpLockMinStreak810
followUpLockMinToolCalls68
skippedToolOnlyTurnThreshold23

4b. detectToolLoop Threshold (src/electron/agent/executor.ts + executor-loop-utils.ts)

LocationBeforeAfter
detectToolLoop default threshold parameter35
maybeInjectToolLoopBreak hardcoded call35

Productive-cycle exemption added to detectToolLoop:

Before returning true for a detected loop, checks if the wider call window contains a mix of read-like and write-like tool categories:

const widerWindow = recentCalls.slice(-(threshold + 3));
const allCategories = new Set(widerWindow.map((c) => c.tool));
const hasReadLike = [...allCategories].some((cat) =>
  /^(read|search|list|glob|find|get)/.test(cat),
);
const hasWriteLike = [...allCategories].some((cat) =>
  /^(write|edit|create|run|execute|shell|command|browser)/.test(cat),
);
if (hasReadLike && hasWriteLike) return false; // it's a fix-cycle, not a loop

This prevents read_file → edit_file → run_command → read_file → … from being classified as a degenerate loop.


Phase 5 — Better Context Management

5a. Compaction Constants (src/electron/agent/executor-helpers.ts)

ConstantBeforeAfter
PROACTIVE_COMPACTION_TARGET0.500.55
COMPACTION_SUMMARY_MAX_OUTPUT_TOKENS40966144
COMPACTION_SUMMARY_MAX_INPUT_CHARS60_00090_000
COMPACTION_USER_MSG_CLAMP30004000
COMPACTION_ASSISTANT_TEXT_CLAMP15002500
COMPACTION_TOOL_USE_CLAMP8001200
COMPACTION_TOOL_RESULT_CLAMP12002000

5b. Smart Message Retention (src/electron/agent/context-manager.ts)

Problem: During context compaction, older messages containing error-recovery context and prior decisions about actively-worked files were removed, causing the agent to lose crucial context mid-task.

New helpers added:

  • extractFilePathsFromMessages(messages) — extracts file paths from message content using regex /(?:\/[\w.@-]+){2,}(?:\.\w+)?/g
  • messageReferencesActivePaths(message, activePaths) — returns true if a message mentions any of the active file paths

Enhancement to removeOlderMessagesWithMeta:

  1. Extracts file paths from the last 4 messages as the "active work context"
  2. When pruning older messages, reserves up to 15% of the token budget for messages that reference actively-worked files
  3. Active-file messages are kept preferentially; only removed if budget is fully exhausted

Phase 6 — Graceful Degradation on Tool Failure

6a–6b. Tool Alternatives Injection (src/electron/agent/executor.ts)

Problem: When computeToolFailureDecision flagged a soft failure, the step loop immediately stopped — giving the agent no chance to switch approaches.

TOOL_ALTERNATIVES mapping added:

const TOOL_ALTERNATIVES: Record<string, string[]> = {
  browser_navigate: ["web_fetch", "web_search"],
  run_command:      ["run_applescript", "write_file"],
  edit_file:        ["write_file"],
  search_files:     ["glob", "list_directory"],
  web_search:       ["web_fetch", "browser_navigate"],
  web_fetch:        ["web_search"],
};

Graceful degradation logic (toolAlternativesInjected flag):

  • On first soft-failure stop signal (not hard failure), injects a system message listing available alternative tools for the disabled tool(s)
  • toolAlternativesInjected is set to true to prevent repeated injection
  • Loop continues for one more iteration, giving the LLM a chance to use alternatives
  • Only stops unconditionally on hard failures or if alternatives were already injected and tools still fail

6c. run_command Failure Guidance (src/electron/agent/executor-helpers.ts)

Enhanced getAlternativeApproachGuidance with run_command-specific messages:

Error PatternSuggestion
command not found, No such file or directoryCheck if the package needs installing; use full path; try a different command
permission denied, EACCESCheck file permissions; write a script file; use a different approach
exit code [1-9], non-zero exit, exited withClarify the command failed (normal during dev), not the tool; read the error and retry

Test Updates

Four tests were updated to match the new thresholds:

Test FileTest NameChange
executor-helpers-cache.test.tstreats browser HTTP status failures as input-dependentLoop i < 5i < 9 (browser threshold 6 → 10)
executor-helpers-cache.test.tstreats missing-module runtime errors as input-dependent before disabling monty_runLoop i < 3i < 7 (general threshold 4 → 8)
executor-step-failures.test.tsfails fast after repeated policy-blocked tool-only turns with no text outputAdded third mock LLM response (threshold 2 → 3)
completion-checks.test.tsuses tighter follow-up lock thresholds for code-domain tasksExpected values 610 (CODE_LOOP_GUARDRAIL change)

Phase 7 — False-Positive Source Validation Guard (Post-deployment fix, 2026-03-07)

Root Cause

A build task ("Create an app about breaking news references") failed at finalization with:

Task missing source validation: release/funding claims require web_fetch sources with explicit publish dates.

The failure chain:

  1. Task prompt contained "breaking news" → taskLikelyNeedsWebEvidence() returned true
  2. The final agent response included "announcement/launch/release" wording — from HTML seed data the agent had just written into the app — not from real-world research claims
  3. The agent had not called web_fetch (correctly — it was building an app, not fetching news)
  4. requiresStrictResearchClaimValidation() returned true → finalization threw

Two complementary fixes applied in src/electron/agent/executor.ts:

Fix 7a: Build-task escape hatch in requiresStrictResearchClaimValidation

// Before
private requiresStrictResearchClaimValidation(candidate: string): boolean {
  if (!this.taskLikelyNeedsWebEvidence()) return false;
  return this.responseHasHighRiskResearchClaim(candidate);
}

// After
private requiresStrictResearchClaimValidation(candidate: string): boolean {
  if (!this.taskLikelyNeedsWebEvidence()) return false;
  // Build tasks that created files are not research tasks — the output describes
  // built artifacts, not factual claims about current events.
  const createdFiles = this.fileOperationTracker?.getCreatedFiles?.() || [];
  if (createdFiles.length > 0) return false;
  return this.responseHasHighRiskResearchClaim(candidate);
}

If the agent created any files during the task, it's a build task. Skip research claim validation.

Fix 7b: Build-intent signals in taskLikelyNeedsWebEvidence

The method triggered on broad keywords like "breaking", "news", "search" even when the task was about building a tool related to those topics, not fetching them. Added build-intent counterweight:

private taskLikelyNeedsWebEvidence(): boolean {
  const prompt = `${this.task.title}\n${this.task.prompt}`.toLowerCase();
  const researchSignals = ["news", "latest", "today", "trending", "breaking", ...];
  if (!researchSignals.some((signal) => prompt.includes(signal))) return false;
  // Build/creation tasks mentioning news-related keywords don't need web evidence.
  const buildSignals = [
    "create an app", "build an app", "make an app", "write an app",
    "create a tool", "build a tool", "create a website", "build a website",
    "implement", "develop an app", "develop a tool",
  ];
  if (buildSignals.some((signal) => prompt.includes(signal))) return false;
  return true;
}

This also prevents misleading step-context prompts (lines 16901–16909) from instructing the LLM to fetch news when it should be building something.


Files Modified

FileNature of Changes
src/electron/agent/executor-helpers.tsCircuit breaker constants, timeout, compaction constants, run_command guidance, getDisabledToolNames()
src/electron/agent/executor.tsDeduplicator params, iteration limits, detectToolLoop threshold + exemption, tool alternatives + graceful degradation, source-validation false-positive fix
src/electron/agent/completion-checks.tsLoop guardrail configs (3 presets), domain completion validation
src/electron/agent/progress-score-engine.tsProgress formula, new signals (read/search/tool), reduced penalties
src/electron/guardrails/guardrail-manager.tsGlobal limit defaults (7 constants)
src/electron/agent/context-manager.tsSmart message retention, extractFilePathsFromMessages, messageReferencesActivePaths
src/electron/agent/executor-loop-utils.tsdetectToolLoop call threshold 3 → 5
src/electron/agent/__tests__/executor-helpers-cache.test.tsUpdated 2 threshold-dependent test loops
src/electron/agent/__tests__/executor-step-failures.test.tsAdded third mock response to match new threshold
src/electron/agent/__tests__/completion-checks.test.tsUpdated expected followUpLock values
src/electron/agent/__tests__/context-manager-compaction.test.tsIncreased filler message size to exceed 8,000-token compaction threshold

Verification

  • TypeScript: npx tsc --noEmit — no errors (unrelated GuardrailSettings.tsx errors pre-existed)
  • Tests: npx vitest run src/electron/agent/__tests__/624/624 passed