Additional

Evolving Agent Intelligence

Synced from github.com/CoWork-OS/CoWork-OS/docs

CoWork OS has a layered memory runtime, a full personality engine, 15+ channels, and a playbook system that auto-captures what worked. The Evolving Agent Intelligence layer connects these systems so the agent visibly improves over time — reducing correction overhead, aligning to communication preferences, and surfacing quantifiable ROI metrics.

All improvements are opt-in (admin-toggleable), rate-limited, and governed by the existing guardrail system. No changes to the security or local-first architecture.


0. Runtime Visibility

The learning loop is now visible as part of the task and operator experience, not just as backend plumbing.

  • Task completion emits a standardized learning progression that shows memory capture, playbook reinforcement, and skill proposal review state
  • Mission Control and task detail views render the same progression so operators can inspect the evidence behind each step
  • Unified recall spans tasks, messages, files, workspace notes, memory entries, and knowledge-graph context behind one search experience
  • Persistent shell sessions preserve cwd, env deltas, and aliases per task/workspace for longer operator workflows
  • Provider routing and fallback decisions are surfaced so automatic model changes are legible in real time

This layer is additive: it makes the learning loop easier to understand and trust while preserving CoWork OS's core surfaces of desktop control, channels, inbox, devices, and governed automation.

One concrete expression of this philosophy is llm-wiki: instead of letting research disappear into transient chat, CoWork can maintain a durable workspace-local knowledge base with raw-source preservation, linked notes, and deterministic vault-health analysis. See LLM Wiki.

Retry-time reuse

The learning loop also feeds the live recovery path, not just post-task analytics.

  • retrying turns reuse recent session evidence through SessionRecallService.search(...)
  • pending verification checklist items are preserved across retries so recovery does not silently drop unfinished checks
  • planning retries can pull in compact playbook guidance, while execution and follow-up retries rely on the normal layered memory runtime plus targeted recovery hints

This keeps retries closer to "continue from what already worked" than "start over from scratch."


1. Layered Memory Runtime

File: src/electron/memory/MemorySynthesizer.ts

Problem

The old monolithic synthesized-memory block mixed durable facts, broad archive recall, and tactical hints into one injected blob. That made it too easy for archive memory to compete with higher-signal user/workspace facts, and it blurred the line between always-on memory and turn-specific recall.

Solution

MemorySynthesizer.synthesize() now uses an explicit wake-up model with distinct roles and budgets:

  1. L0 Identity in <cowork_hot_memory> for the facts that should stay front-and-center:
    • curated hot-memory entries from CuratedMemoryService
    • user profile facts
    • relationship memory
    • workspace-kit essentials
  2. L1 Essential Story in <cowork_structured_memory> for ranked supporting context:
    • playbook patterns
    • current knowledge graph entities/relationships
    • daily summaries
    • archive memory only when explicitly enabled
  3. L2 Topic Packs as explicit .cowork/memory/topics/*.md loads through memory_topics_load
  4. L3 Deep Recall as explicit tools:
    • search_quotes
    • search_sessions
    • search_memories

Workspace kit context still renders separately before memory sections, and only L0 + L1 are injected into the live prompt by default.

Configuration

Default runtime behavior:

SettingDefault
L0 Identity injectionon
L1 Essential Story injectionon
Archive memory injectionoff (defaultArchiveInjectionEnabled: false)
Quote/session/archive recalltool-driven (search_quotes, search_sessions, search_memories)
Topic packstool-driven (memory_topics_load)

Curated-memory guardrails

  • Curated entry content is capped at 320 characters
  • match strings for replace/remove are capped at 120 characters
  • memory_curate supports stable id values so replace/remove operations can be deterministic
  • Curated file sync into .cowork/USER.md and .cowork/MEMORY.md is serialized per workspace and retried on file-change races

Sources

The runtime now thinks about sources by wake-up layer instead of one flat synthesis list:

LayerSources
L0 IdentityCuratedMemoryService, UserProfileService, RelationshipMemoryService, workspace-kit essentials
L1 Essential StoryPlaybookService, KnowledgeGraphService, DailyLogSummarizer, optional archive fragments from MemoryService
L2 Topic Packsmemory_topics_load over .cowork/memory/topics/*.md
L3 Deep Recallsearch_quotes, search_sessions, search_memories

daily_summary fragments come from .cowork/memory/summaries/<YYYY-MM-DD>.md files produced by DailyLogSummarizer. Raw daily log files (.cowork/memory/daily/) are never injected into prompts.

Output format

<cowork_hot_memory>
## Curated Hot Memory
- [Curated entry]

## You & the User
- [UserProfile fact]
- [RelationshipMemory item]
</cowork_hot_memory>

<cowork_structured_memory>
## Past Task Patterns
- [Playbook entry]

## Known Entities
- [KnowledgeGraph entity]

## Recent Summaries
[Daily summary snippet]
</cowork_structured_memory>

2. Adaptive Style Engine

File: src/electron/memory/AdaptiveStyleEngine.ts

Problem

PersonalityManager has rich response style settings (emoji usage, response length, explanation depth, code comment style) but they are 100% manual. The agent never learns from observed user behaviour.

Solution

AdaptiveStyleEngine observes every user message and feedback signal, then gradually shifts PersonalityManager settings within configurable rate limits.

Signals observed:

SignalHow detectedEffect
Short messagesRolling average of last 50 message lengthsShifts responseLength toward "terse"
Emoji in messagesFraction of messages containing emojiShifts emojiUsage toward "moderate"
Technical vocabularyDensity of tech terms (docker, kubernetes, nginx, …)Shifts explanationDepth toward "expert"
"Too verbose" feedbackRegex on feedback reasonShifts responseLength toward "terse"
"More detail" feedbackRegex on feedback reasonShifts responseLength toward "detailed"
"No emoji" feedbackRegex on feedback reasonShifts emojiUsage toward "none"
Expert/beginner signalsRegex on feedback reasonShifts explanationDepth

Rate limiting: Maximum adaptiveStyleMaxDriftPerWeek one-level shifts per 7-day window. Counter resets weekly. State persisted via SecureSettingsRepository.

Audit trail: Every adaptation is recorded in getAdaptationHistory() with dimension, from/to values, reason, and timestamp.

Configuration (GuardrailSettings → Behavior Adaptation)

SettingDefaultDescription
adaptiveStyleEnabledfalseMaster enable — no observation or adaptation when off
adaptiveStyleMaxDriftPerWeek1Max style-level shifts per 7-day period

The Behavior Adaptation section in Guardrail Settings exposes these toggles alongside a Reset learned style button that calls AdaptiveStyleEngine.reset() via the kit:resetAdaptiveStyle IPC channel.

Integration points

  • daemon.ts — calls AdaptiveStyleEngine.observe(text) after every UserProfileService.ingestUserMessage()
  • daemon.ts — calls AdaptiveStyleEngine.observeFeedback(decision, reason) alongside UserProfileService.ingestUserFeedback()
  • GuardrailSettings.tsx — renders toggle, drift input, and reset button under "Behavior Adaptation"

3. Playbook-to-Skill Auto-Promotion Pipeline

File: src/electron/memory/PlaybookSkillPromoter.ts

Problem

PlaybookService detects repeated successful patterns. SkillProposalService has a full admin approval workflow for new skills. They were not connected — no automation converted proven patterns into governed, reusable skills.

Solution

When a playbook pattern is reinforced 3+ times (configurable threshold), PlaybookSkillPromoter.maybePropose() auto-generates a skill proposal with:

  • Problem statement — "Recurring task pattern detected (reinforced N times): …"
  • Evidence — reinforcement count, common tools, example requests
  • Draft skill — ID, name, description, prompt template (generated from evidence), icon, category
  • Required tools list

The proposal enters the existing SkillProposalService governance workflow — an admin sees the evidence and approves or rejects with one click. No skill is created automatically.

Flow:

Task completes successfully
  → PlaybookService.reinforceEntry() writes reinforcement memory
  → PlaybookService.events.emit("pattern-reinforced")
  → executor.ts calls PlaybookSkillPromoter.maybePropose() (async, fire-and-forget)
    → findCandidates() groups reinforcement memories by normalized task description
    → if count ≥ threshold: proposeSkill() via SkillProposalService.create()
    → proposal enters admin review queue

Cooldown: 10 minutes per workspace between promotion checks. Max 1 proposal per check.

Dedup: SkillProposalService.create() handles duplicate detection — returns duplicateOf if a similar proposal already exists.

Configuration

SettingDefaultDescription
DEFAULT_PROMOTION_THRESHOLD3Min reinforcements before proposing
PROMOTION_COOLDOWN_MS10 minMin time between checks per workspace
MAX_PROPOSALS_PER_CHECK1Max new proposals per check

4. Cross-Channel Persona Coherence

File: src/electron/memory/ChannelPersonaAdapter.ts

Problem

The agent connects to 15+ channels but delivers the same personality regardless of channel norms. A Slack reply should feel different from an email reply — not because the agent has different knowledge or values, but because each platform has its own communication culture.

Solution

ChannelPersonaAdapter.adaptForChannel() takes the detected originChannel (from task.agentConfig.originChannel) and returns a channel-specific directive that is appended to (not replacing) the core personality prompt.

Channel profiles:

ChannelLengthFormattingEmojiFormal framing
slackShorterStructuredNoNo
emailLongerStructuredNoYes (greeting + sign-off)
whatsappShorterPlainYesNo
imessageShorterPlainYesNo
signalShorterPlainNoNo
discordNormalStructured + markdownYesNo
teamsNormalStructuredNoNo
telegramShorterMinimalNoNo
mattermostNormalStructuredNoNo
matrixNormalStructuredNoNo
googlechatShorterPlainNoNo
twitchShorterPlainYesNo

Group/public context overlay: When gatewayContext is "group" or "public", an additional privacy-aware directive is layered on (do not share sensitive information, be aware others are reading).

Configuration (GuardrailSettings → Behavior Adaptation)

SettingDefaultDescription
channelPersonaEnabledfalseEnable channel-specific persona adaptation

This toggle is exposed in the same Behavior Adaptation section as Adaptive Style.

Integration

executor.ts injects the channel directive when assembling the system prompt:

const channelDirective = ChannelPersonaAdapter.adaptForChannel(
  task.agentConfig.originChannel,
  gatewayContext,
);
// channelDirective is appended to personalityPrompt before budgeting

5. Evolution Metrics Service

File: src/electron/memory/EvolutionMetricsService.ts

Problem

CoWork OS tracks basic relationship stats (tasks completed, days together) but has no concept of measuring agent improvement over time. For enterprise buyers, quantifiable ROI is the difference between a tool and a strategic investment.

Solution

EvolutionMetricsService.computeSnapshot() computes 5 metrics on-demand from existing service data:

Metric IDLabelSourceInterpretation
correction_rateCorrection RatePlaybookService (failure entries)Lower this week vs. prior 3-week avg → "improving"
adaptation_velocityStyle AdaptationsAdaptiveStyleEngine historyAny adaptations applied → agent is learning
knowledge_growthKnowledge GraphKnowledgeGraphService.getStats()Entity and relationship count
task_success_rateTask Success RatePlaybookService (success/failure entries)Percentage of recorded tasks that succeeded
style_alignmentStyle AlignmentAdaptiveStyleEngine historyRatio of proactive vs. feedback-driven adaptations

Each metric includes a trend ("improving" / "stable" / "declining") and a human-readable detail string.

Overall Score: Composite 0–100 score weighted by trend directions and bonus points for high success rate and large knowledge graph.

Daily Briefing integration

The evolution_metrics section is added to BriefingSectionType and enabled by default in DEFAULT_BRIEFING_CONFIG. DailyBriefingService.buildEvolutionMetrics() calls EvolutionMetricsService.computeSnapshot() and maps metrics to BriefingItem[].

Example briefing output:

Agent Evolution (Day 45, 123 tasks completed):
  [+] Task Success Rate: 84% — 103 succeeded, 20 failed out of 123 recorded tasks
  [+] Knowledge Graph: 47 entities — 47 entities, 82 relationships, 310 observations
  [=] Correction Rate: 2/week — Correction rate is stable
  [+] Style Adaptations: 3 total — 89 messages observed, 3 adaptations applied
  [+] Style Alignment: 100% — No adaptations yet — using default style
  Overall Evolution Score: 72/100


6. Daily Operational Log

File: src/electron/memory/DailyLogService.ts

Purpose

Provides structured per-day journaling as input for the summary-first memory pipeline. Entries are written to .cowork/memory/daily/<YYYY-MM-DD>.md.

When to write entries

CategoryTrigger
feedbackUser thumbs-up/down events
taskTask completions
decisionNotable agent decisions
observationHigh-value memory saves or corrections

Raw log files are never injected into prompts directly. They exist only as input for DailyLogSummarizer.

Entry format

## 2026-03-14T15:30:00.000Z
source: user
category: feedback
taskId: task-abc123
tags: tone, correction

User flagged response as "wrong tone".

API

await DailyLogService.appendEntry(workspacePath, {
  timestamp: new Date().toISOString(),
  source: "user",
  category: "feedback",
  text: "User flagged response as wrong tone.",
  taskId: "task-abc123",
  tags: ["tone"],
});

7. Daily Log Summarizer

File: src/electron/memory/DailyLogSummarizer.ts

Purpose

Produces ranked MemoryFragment objects from pre-written daily summary files (.cowork/memory/summaries/<YYYY-MM-DD>.md). This completes the summary-first retrieval pipeline: summaries rank higher than raw log snippets but lower than user profile and relationship memory.

Directory layout

.cowork/
  memory/
    daily/
      2026-03-14.md    ← raw operational log (DailyLogService writes)
    summaries/
      2026-03-14.md    ← synthesized summary (written externally, e.g. cron)

Summary file format

---
updated: 2026-03-14
source: daily_log_synthesizer
day: 2026-03-14
---

# Daily Summary

## Important Decisions
- ...

## User Preferences Observed
- ...

## Active Threads
- ...

## Corrections / Lessons
- ...

## Follow-ups
- ...

Retrieval ranking

SourceBase relevanceNotes
user_profile0.70Always somewhat relevant
daily_summary0.55 × recency decayRecency half-life = 7 days
Raw daily logsnever returnedNot injected by this service

Integration

MemorySynthesizer.synthesize() calls DailyLogSummarizer.getRecentSummaryFragments() for the last 7 days and adds the results to the structured-memory lane as daily_summary fragments. They render under ## Recent Summaries inside <cowork_structured_memory>.

Helper

DailyLogSummarizer.countRecentSummaries(workspacePath, 7)
// → number of summary files present in the last 7 days
// Used by the Improvement Signals card

8. Task and Message Feedback

UI: src/renderer/components/MainContent.tsx (task completion banner and shared feedback plumbing)

IPC: kit:submitMessageFeedbackUserProfileService.ingestUserFeedback()

Interaction

Completed tasks now expose 👍 / 👎 controls in the completion banner so users can rate the overall outcome. The same IPC contract still supports structured message-level feedback for adaptation-oriented flows. Thumbs-down uses the same structured reason vocabulary:

Reason keyLabel
incorrectIncorrect
too_verboseToo verbose
ignored_instructionsIgnored instructions
wrong_toneWrong tone
unsafeUnsafe / unwanted

IPC payload

window.electronAPI.submitMessageFeedback({
  taskId: string,
  messageId?: string,          // present for message-scoped feedback
  decision: "accepted" | "rejected",
  reason?: string,             // one of the keys above
  note?: string,               // optional free-text (future)
  kind?: "message" | "task",
});

Feedback is routed to UserProfileService.ingestUserFeedback() and (via daemon) to AdaptiveStyleEngine.observeFeedback().


Governance Summary

All improvements respect CoWork OS's security-first positioning:

ImprovementGuardrail flagDefaultRate limitAudit trail
Layered Memory RuntimedefaultArchiveInjectionEnabled controls archive injection; hot/structured memory default onCurated + structured on, archive offToken budgets per sectionSource attribution by lane + tool-level recall traces
Adaptive Style EngineadaptiveStyleEnabledOffadaptiveStyleMaxDriftPerWeek (default 1)getAdaptationHistory()
Playbook-to-SkillAlways active (post-task hook)10 min cooldown, max 1/checkFull proposal review workflow
Channel PersonachannelPersonaEnabledOffVisible in system prompt
Evolution MetricsComputed on-demandRead-only, no mutations
Daily LogAvailable when a writer uses DailyLogServiceFile append onlyPer-day markdown files
Daily SummariesActive when summary files existToken budget (ranked)Summary files in .cowork/memory/summaries/
Message FeedbackAlways visible on completed messagesIPC: limited tierRouted to UserProfileService

Test Coverage

ServiceTest file
MemorySynthesizersrc/electron/memory/__tests__/MemorySynthesizer.test.ts
CuratedMemoryServicesrc/electron/memory/__tests__/CuratedMemoryService.test.ts
SessionRecallServicesrc/electron/memory/__tests__/SessionRecallService.test.ts
LayeredMemoryIndexServicesrc/electron/memory/__tests__/LayeredMemoryIndexService.test.ts
AdaptiveStyleEnginesrc/electron/memory/__tests__/AdaptiveStyleEngine.test.ts
PlaybookSkillPromotersrc/electron/memory/__tests__/PlaybookSkillPromoter.test.ts
ChannelPersonaAdaptersrc/electron/memory/__tests__/ChannelPersonaAdapter.test.ts
EvolutionMetricsServicesrc/electron/memory/__tests__/EvolutionMetricsService.test.ts