Architecture
CoWork OS Architecture
Synced from github.com/CoWork-OS/CoWork-OS/docs
CoWork OS is a local-first desktop runtime and AI workbench for task execution, generated knowledge-work artifacts, background operator loops, and multi-surface automation.
Core Architecture
- Electron main process: task orchestration, agent runtime, heartbeat orchestration, IPC, and tool execution
- React renderer: desktop UI, Mission Control, task timeline, settings, and monitoring surfaces
- Tool and connector layer: file, shell, browser, web, native integrations, document generation/compilation tools including source-first LaTeX PDF compilation, MCP connectors, remote execution, and computer use (
screenshot,click,type_text, and related tools) as a governed desktop-GUI lane (platform helper, single-session lock, policy-gated routing). See Computer use. - Composer mention layer: the renderer and Electron preload expose a grouped
@autocomplete for Agents, configured Integrations, and Files. Integration mentions are resolved locally, render as rich chips, persist in task/session metadata, and inject soft routing guidance into the executor without changing permissions orallowedTools. See Composer Mentions. - Message shortcut layer: the renderer exposes one
/picker for deterministic app commands and skill-backed workflow shortcuts. Shared app command parsing handles/schedule,/clear,/plan,/cost,/compact,/doctor, and/undo; plugin-pack aliases resolve to target skill IDs before generic skill slash execution. See Message Box Shortcuts. - Chronicle screen-context lane: desktop-only passive recent-screen capture, local ranking/OCR enrichment, source resolution, provenance-aware
screen_context_resolvetool exposure, and promotion of task-used observations into workspace-backedscreen_contextevidence plus optional linked background memory generation. See Chronicle. - Managed resource layer: first-class
ManagedAgent,ManagedEnvironment, andManagedSessioncontrol-plane resources package reusable execution definitions and durable run identities on top of existingTask,AgentTeamRun, andSessionRuntimeprimitives. See Managed Agents. - Automation/event layer: scheduled tasks, webhooks, channel events, and MCP connector/resource notifications all flow through the same trigger engine
- Turn and tool orchestration: a session-scoped
SessionRuntimeowns task-session state, session checklists, permission state, turn coordination, resume/snapshot persistence, and task projection, while a lower-levelTurnKernelhandles the active step, follow-up, or text turn; a metadata-drivenToolSchedulerbatches concurrency-safe reads, serializes conflicting writes, and keeps tool-result ordering stable - Prompt stack and tool guidance: execution prompts are assembled from named session- and turn-scoped sections with explicit budgets; stable session sections form a provider-cacheable prefix, volatile turn sections stay uncached, layered memory injects only
L0 Identity+L1 Essential Storyby default whileL2 Topic PacksandL3 Deep Recallremain tool-driven, retry-aware recovery guidance can inject attempt/retry state plus recent session evidence, and visible tools receive prompt-aware descriptions rendered only after policy and mode filtering - Additive skill runtime: canonical task text remains immutable for skill routing purposes, while
use_skillattaches structuredSkillApplicationcontext plus scoped runtime directives instead of rewriting the task prompt - Delegation graph: delegated work now runs through a normalized orchestration graph engine so spawned agents, team work, workflow phases, and ACP tasks share one run/node/event model
- Worker roles and verification: built-in worker roles (
researcher,implementer,verifier,synthesizer) carry hard tool scopes, delegated work receives a structured brief instead of raw prompt passthrough, and verification runs use both early nudges and a dedicated verdict/report contract - Adaptive model routing: the executor can switch into a workflow-pipeline path where decomposed phases run as child tasks with per-phase model overrides or capability-based auto-selection
- Federated agent orchestration: ACP registry + remote invocation let orchestrators target local roles or remote A2A-compatible agents under shared approval and policy controls
- Local persistence: SQLite, local files, curated hot-memory entries, archive memory rows and summaries, transcript spans/checkpoints with structured summaries + verbatim evidence packets, knowledge graph state including temporal edge validity, run records, orchestration graph nodes/events, ACP agent registrations and ACP task state, usage telemetry, feedback events,
session_runtime_v2task snapshots, managed-agent tables (managed_agents,managed_agent_versions,managed_environments,managed_sessions,managed_session_events),.cowork/memory/topics, and workspace-kit contracts in.cowork/ - Artifact preview layer: file preview IPC resolves workspace-contained outputs, extracts document content, and enriches artifacts with renderer-ready previews. Spreadsheet previews are extracted in Electron into shared sheet structures (
spreadsheetPreview) for sheet names, used bounds, display values, formulas, styles, and column widths; workbook formats useexceljs, while CSV/TSV use a delimited parser and save back with the original delimiter. Native/app-owned spreadsheet formats such as Numbers and Google Sheets shortcuts are recognized as artifacts but open externally. Word-style document previews are extracted intodocumentPreview; DOCX-like files use Mammoth plus editable block metadata, RTF and ODT/OTT use best-effort local text extraction, legacy DOC attempts local converter fallback, and Pages is recognized for external handling. Web page previews are extracted intowebPreview; HTML/HTM files and built React output entrypoints return sandbox-ready iframe HTML with local assets inlined where possible, while React-style projects without build output return a structured preview-unavailable state. ExistingcontentandhtmlContentfallbacks remain for compatibility. PPTX previews usepresentationPreviewwith fast text/notes extraction, cachedimageUrlslide PNGs, background full rendering through Codex@oai/artifact-tool, localsoffice+pdftoppmfallback, in-flight render dedupe, and text-only fallback when image rendering is unavailable. - Browser V2 workbench layer: interactive browser-use tools target a renderer-owned Electron webview by default, with main-process automation owned by
BrowserSessionManagerand routed through ElectronwebContents.debugger/ CDP. The main process maps{ taskId, sessionId }to the webview'swebContentsId; browser tools route navigation, accessibility snapshots, ref-aware click/fill/type/read/hover/drag/upload actions, dialogs, downloads, diagnostics, emulation, tracing, and screenshots to that visible session. The renderer opens the resizable right-sidebar/fullscreen Browser Workbench on demand and carries status, screenshot capture, annotation handoff, diagnostics UI, snapshot overlay state, and cursor events so users can see agent movement over the page. The embedded session uses a persistent per-workspace partition isolated from system Chrome; explicit forced-headless, profile, browser-channel, and Chrome DevTools attach options keep Playwright/local and external-CDP fallback paths available for remote/headless or signed-in Chrome/Edge cases. Real-browser profile control requires explicit consent. See Browser Workbench and Browser V2 Architecture. - Permission engine: layered tool approval decisions combine workspace capabilities, explicit rules, hard guardrails, session grants, workspace-local policy files, and mode defaults including
dangerous_only, with workspace rule browsing/removal in Settings - Runtime visibility surfaces: the task runtime emits learning progression, unified recall, persistent shell, live routing events, semantic tool-batch summaries, curated external progress relays for text-first channels, session-checklist events, and follow-up completion events into Mission Control and the renderer so operator state stays visible instead of hidden in services
- Everything Workbench artifact surfaces: completion cards, timeline details, and Files panels share output metadata so generated docs, sheets, decks, web pages, PDFs, previews, and live browser sessions stay attached to the task that produced or used them. Spreadsheet outputs render as compact cards; editable workbook/CSV/TSV files open into a sidebar/fullscreen artifact workbench with editable grid state, persisted sidebar width, and fullscreen follow-up context, while native app formats keep external-app/folder actions. Word-style document outputs render as compact cards; DOCX opens into a direct-edit sidebar/fullscreen document workbench with Google Docs-style controls, save/copy actions, persisted sidebar width, fullscreen follow-up context, and preview refresh after follow-up edits, while non-editable document formats keep best-effort preview and external-app/folder actions. Presentation outputs render as compact cards; PPTX opens into a sidebar/fullscreen presentation workbench with thumbnails, navigation, zoom, speaker notes, cached slide rendering, persisted sidebar width, and deferred refresh after follow-up completion, while legacy PowerPoint formats keep external actions. Web page outputs render as compact cards; generated HTML/HTM and built React output open into a sandboxed sidebar/fullscreen iframe workbench with browser/folder/copy actions, persisted sidebar width, and deferred refresh after follow-up completion, while React-style projects without build output show a build-output-needed state instead of starting a dev server. Live website testing opens a browser workbench in the same right-sidebar/fullscreen model so the agent can interact with a visible page without launching an external browser. LaTeX PDFs compiled through
compile_latexcarrysourcePathmetadata so the renderer can pair the editable.texsource with the generated PDF in one artifact workbench. - Lifecycle reconciliation: completion persists terminal task state before emitting terminal events, and resume paths re-derive canonical persisted status before writing
executing, so late approval or follow-up resumes cannot reopen completed tasks - Completion hardening: verified-mode evidence bundles, step-intent alignment/decomposition heuristics, read-only entropy sweeps, and verifier verdict/report projection make completion checks more explicit without mutating the task's final result
Profiles and Isolation
CoWork supports multiple app profiles so one install can keep separate operating environments for different users, clients, or trust zones.
- each profile has its own user-data root, SQLite database, encrypted settings, channel configs, managed skills, and session history
- profile export/import moves a complete app profile bundle without merging it into another profile implicitly
- workspaces still live outside the app profile, but the profile controls the credentials, automations, channels, and runtime state that operate on those workspaces
- profile switching is an app-level concern, separate from personality export/import or workspace-kit files
Heartbeat V3
Heartbeat v3 is the default background automation architecture.
- Signal ledger: ambient changes, mentions, manual wakes, and awareness events emit normalized heartbeat signals instead of accumulating raw wake requests
- Pulse: cheap, deterministic, non-LLM state reduction that evaluates merged signals, due proactive work, checklist cadence, foreground contention, and dispatch guardrails
- Dispatch: escalation lane invoked only when Pulse decides the situation warrants user-visible or task-visible work
- Run records: every Pulse and Dispatch execution is tracked, and any heartbeat-created task is linked back to its originating heartbeat run
- Defer and compress: foreground manual work suppresses churn by compressing pending signals into resumable deferred state instead of growing a queue
See Heartbeat v3 for the detailed runtime contract.
Workspace Kit
The .cowork/ workspace kit holds durable human-edited operating context.
BOOTSTRAP.mdis a one-time onboarding checklistHEARTBEAT.mdis reserved for recurring heartbeat checklist workUSER.mdandMEMORY.mdcan contain both human-authored content and auto-managed curated-memory blocks- project-scoped context lives under
.cowork/projects/<projectId>/
Skills Runtime Model
The skill system now follows an additive contract:
- the canonical user request is resolved as
rawPrompt -> userPrompt -> prompt - task creation normalizes prompt fields centrally so new tasks always persist canonical prompt data
- skill routing works as shortlist-and-hint guidance, not prompt takeover
- slash commands can still invoke skills deterministically, including first-class bundled workflows such as
/simplify,/batch,/llm-wiki, direct skill IDs, and plugin-pack aliases from the message box shortcut picker, but the result is applied additively use_skillreturns structured context plus scoped directives, not a replacement task definition- the executor builds runtime context from canonical prompt + task notes + applied skill content
- the renderer always shows canonical task text and renders applied skills separately
This prevents skills from hijacking the task while preserving proactive skill selection.
See Skills Runtime Model for the detailed contract.
Gateway Message Lifecycle
Remote channel messages are routed through a shared lifecycle for command dispatch, task-session ownership, follow-ups, cancellations, progress delivery, skill slash invocation, and scheduled-task output delivery. The gateway treats recognized slash commands as owned commands, not task text, and uses generation guards so stale task updates are not delivered after a chat starts fresh or cancels.
See Gateway Message Lifecycle for the user-facing command and delivery model.
Repo Landmarks
src/electron/: main-process runtime, services, database, scheduling, monitoringsrc/electron/agent/runtime/SessionRuntime.ts: canonical task-session owner for execution, recovery, snapshotting, and task projectionsrc/renderer/components/RightPanel.tsx: renderer-side read-only projection of the latest session checklist statesrc/electron/agent/runtime/PermissionEngine.ts: layered tool-approval evaluation, rule matching, and fallback escalationsrc/renderer/: React UI and settings surfacessrc/shared/: shared contracts and typesdocs/: product and architecture documentation.cowork/: local workspace operating context
Computer use
Native GUI control is implemented in the main process (src/electron/computer-use/, src/electron/agent/tools/computer-use-tools.ts) with a persistent platform helper runtime and a singleton session that coordinates single-task ownership plus Esc abort. macOS uses helper-targeted permission bootstrap; Windows uses a bundled Win32 helper for visible, non-minimized windows. Tool policy and the executor only expose the computer-use lane when native desktop GUI intent is detected so routine web and repo work stays on browser and shell paths. Product-level behavior, permissions, and troubleshooting are documented in Computer use.
Chronicle
Chronicle is implemented as a dedicated desktop screen-context subsystem under src/electron/chronicle/.
ChronicleCaptureServicemaintains the local recent-screen ring buffer in app-local storageChronicleSelectorranks frames by recency, app/window metadata, and OCR-derived local textChronicleSourceResolverenriches Chronicle captures with frontmost URL/file/app references when availableChronicleProvenanceturns screen-derived text into untrusted, provenance-tagged contextChronicleObservationRepositorypromotes only task-used observations into.cowork/chronicle/ChronicleMemoryServicecan create linkedscreen_contextmemory rows through the normal memory pipelinescreen_context_resolveis registered from the agent tool registry, exposed through the dedicated built-inchronicletool category, and hidden when Chronicle is disabled, paused, or unavailable- Mission Control and runtime visibility consume promoted observations as
screen_context, not as a separate memory database - renderer surfaces for Chronicle now live in Memory Hub, Memory settings, task-creation toggles, and tray/menu-bar controls
Chronicle shares the Screen Recording prerequisite with computer use, but it is a different lane: local screen understanding rather than direct GUI control. Product-level behavior, testing, and privacy boundaries are documented in Chronicle.
Update Rule
If defaults, behavior, or architecture change, update this file in the same PR.