Tech stack Locked in PRD §8 + SRS DEC-001..066

The stack, top to bottom

Eight tiers, sixteen major picks, every choice motivated by the same bias: open-source, self-hostable, audit-able primitives.

Every layer of CyberOS is locked in PRD §8 and the SRS DEC-001..DEC-066 log. The bias is consistent: open-source, self-hostable, audit-able primitives — never lock the company to a vendor that could later compromise Vietnamese data sovereignty or push the cost ceiling above $4/active-user/month at 50-tenant scale. This page traces each of the eight tiers, the specific framework chosen, the rejected alternatives, the production cost shape, and the migration door if the choice goes wrong.

0

Eight tiers, one supergraph

CyberOS has more than 8 distinct concerns (we enumerate 16 below), but they cluster cleanly into 8 architectural tiers. Each tier is a separate deployment unit, scales independently, and exposes a stable contract to the tier above.

flowchart TB subgraph T1 ["T1 · Persona / Agent layer"] LANG["LangGraph supervisor (StateGraph + interrupt)"] SKILLS["Anthropic Skills format · 10 C-level skills hot-reload"] LITELLM_T1["LiteLLM client (routing core)"] end subgraph T2 ["T2/T3 · Frontend layer"] HOST["Host shell · Vite + React 19 + Tauri (desktop)"] REMOTES["Module remotes · Webpack 5 + Module Federation"] end subgraph T3 ["T4/T5/T6 · API + Agent surface"] APOLLO["Apollo Router · GraphQL Federation v2.5+"] MCPGW["MCP Gateway · Streamable HTTP · 2025-11-25"] AIGW["AI Gateway · LiteLLM router · Bedrock primary"] end subgraph T4 ["T7 · Backend services"] SUBGRAPHS["22 subgraphs · TypeScript (Yoga) or Rust (async-graphql)"] MCPSERVERS["22 MCP servers · per-module · TS SDK or mcp-rs"] end subgraph T5 ["T8/T9 · Data + search"] PG["PostgreSQL 17 + pgvector HNSW + Apache AGE 1.5 + PGroonga"] EMBED["BGE-M3 embedder + BGE-rerank-v2-m3 (self-hosted)"] end subgraph T6 ["T10/T11 · Infrastructure"] NATS_T["NATS JetStream (event spine)"] S3["S3 / R2 / MinIO (object storage)"] end subgraph T7 ["T12/T13/T14 · Cryptography + sync"] YJS["Yjs / Automerge (CRDTs for realtime)"] CRYPTO["Ed25519 + scrypt key wrap + MMR + STH"] LEDGER["msgspec canonical JSON · binlog framing"] end subgraph T8 ["T15/T16 · Compliance + UX"] OPA["OPA + Conftest (policy)"] TRUST["Trust Center (cert hosting)"] BVP["Be Vietnam Pro · CyberSkill design system"] end T1 --> T2 T2 --> T3 T3 --> T4 T4 --> T5 T4 --> T6 T4 --> T7 T4 --> T8 classDef t1 fill:#f9c64f,stroke:#9c750a classDef t2 fill:#e8d4c2,stroke:#45210e classDef t3 fill:#fef6e0,stroke:#9c750a classDef t4 fill:#f5ede6,stroke:#45210e classDef t5 fill:#cba88a,stroke:#45210e classDef t6 fill:#fde7b3,stroke:#9c750a classDef t7 fill:#fee2e2,stroke:#b91c1c classDef t8 fill:#f0eee9,stroke:#475569 class LANG,SKILLS,LITELLM_T1 t1 class HOST,REMOTES t2 class APOLLO,MCPGW,AIGW t3 class SUBGRAPHS,MCPSERVERS t4 class PG,EMBED t5 class NATS_T,S3 t6 class YJS,CRYPTO,LEDGER t7 class OPA,TRUST,BVP t8

Three design constraints driving every pick

  1. 1. Vietnamese data sovereignty — no SaaS dependency where Vietnamese-origin personal data must travel through a US-based vendor's servers. AWS Bedrock is acceptable because of the ap-southeast-1 region; OpenAI direct is not (no Singapore endpoint).
  2. 2. Cost ceiling at scale — ≤ $150/mo LLM + $230/mo infra at 10-Member internal; ≤ $4/active user/mo LLM + $2,200/mo infra at 50-tenant (N(FR pending), N(FR pending)). Anything that doesn't fit that envelope is rejected.
  3. 3. Migration door always open — every pick has a documented escape hatch. Storage is S3-compatible (DEC-005), so R2 ↔ MinIO ↔ AWS S3 is a config flip. SQL is portable, audit chain is exportable, MCP servers are spec-conforming.
1

Tier 1 — Persona / Agent layer

Pick: LangGraph (state graph) + LiteLLM (routing) + Anthropic Skills format (10 C-level skills hot-reloadable).

LangGraph is the agentic-supervisor framework. Its StateGraph primitive models CUO as a graph of nodes (router, skill-load, tool-call, HITL-confirm, answer-compose) with first-class interrupt() support for human-in-the-loop gates on destructive tools. LiteLLM is the model-routing core that owns provider failover and prompt caching. Anthropic Skills format (SKILL.md + scripts/ + references/) keeps each C-level skill as a hot-reloadable directory.

Why these picks

  • LangGraph, not DSPy / native LangChain — StateGraph models CUO's interrupt/resume semantics natively; DSPy is optimisation-first (signature → optimiser pipeline) and not a fit for a long-lived router; native LangChain agents are imperative and harder to audit.
  • LiteLLM — MIT-licensed, OpenAI-shaped surface, 100+ providers behind a single API, drop-in OTel hooks.
  • Anthropic Skills format — open standard (Anthropic + VS Code + Cursor + Codex compat); hot-reload via watchexec; metadata-first dispatch keeps context window lean.

Trade-offs

  • LangGraph cost: Python-only (no Rust); requires an LLM API per supervisor decision (mitigated by Haiku-tier routing).
  • Skill format: not yet a published RFC; depends on Anthropic continuing to evolve it. Mitigation: schema-pinned to 2025-04 spec; conformance tests in CI.
  • LiteLLM fork risk: CyberOS-specific middleware (Presidio redaction, VN identifier rules, persona stamping) lives as a fork branch; rebases monthly.

Production cost

≤ $0 host cost (runs inside the CUO service Pod). LLM cost flows through Tier 6 AI Gateway.

2

Tier 2 — Frontend host shell

Pick: Vite + React 19 + Tauri (desktop bundling).

The host shell is the thin orchestrator that hosts Module-Federation remotes. Vite (Rollup-based) for dev-server speed; React 19 for the new use() hook and React Compiler's automatic memoisation; Tauri for the desktop bundling story.

Why these picks

  • Vite — sub-second HMR; ESM-native; Rollup-based prod build; Module Federation plugin (@module-federation/vite) stable in late 2025.
  • React 19 — React Compiler removes most useMemo/useCallback ceremony; use() hook simplifies suspense; concurrent rendering is GA.
  • Tauri 2 (Rust) — 3-10 MB bundle vs Electron's 100+ MB; uses OS webview; passes Apple notarisation by default.

Trade-offs

  • Webview consistency: Tauri uses OS-native webviews (Edge WebView2, WKWebView, WebKitGTK) → CSS feature parity needs CI matrix testing.
  • Module Federation + Vite: the official plugin is newer than Webpack's; some plugins lag behind. Mitigation: host shell built around the official plugin's stable subset only.
  • React 19 ecosystem: some libraries (charts, drag-drop) still pinning React 18; monitor and pin as needed.
3

Tier 3 — Frontend remotes (per module)

Pick: Webpack 5 + Module Federation v2 per module.

Each module ships as an MF remote bundle. The host shell lazy-loads on route entry. CSS scoped via CSS Modules to prevent cross-module collisions. Design tokens come from one published package (@cyberskill/tokens).

Why Webpack for remotes, Vite for host?

  • Module Federation maturity — Webpack's MF plugin has 4+ years of production miles; remote bundling, runtime versioning, and shared-deps resolution are battle-tested.
  • Bidirectional remotes — Webpack MF supports remote-as-host (a module can host its own sub-remotes); useful for SKILL → CUO interactions.
  • Host can be Vite — the Vite MF plugin can consume Webpack remotes; the inverse is not always true.

NFR ceilings

  • N(FR pending) — module first-paint ≤ 1.5s on cold load
  • N(FR pending) — initial module bundle ≤ 50 KB gzipped JS
  • N(FR pending) — module-rebuild time on token change ≤ 30 min including tests
4

Tier 4 — API gateway (GraphQL Federation)

Pick: Apollo Router (Rust) — Federation v2.5+ compliant.

The Rust-based Apollo Router executes the composed supergraph plan. Validates JWTs, attaches tenant + actor context, runs persisted-query lookup, dispatches subgraph fanout in parallel. Detailed in the Infrastructure page.

Why Apollo Router, not Mesh / Hasura / Yoga?

  • Apollo Router — production-grade Rust runtime; reference implementation for Federation v2.5; query plan cache; persisted-query story is first-class.
  • GraphQL Mesh — flexible but is the wrong abstraction for federation (REST/SOAP/SQL adapters; not a router).
  • Hasura — Postgres-first; doesn't model multi-subgraph federation; vendor lock-in concerns.
  • GraphQL Yoga (as router) — Yoga is great as a subgraph server; not as a federation router.

Trade-offs

  • Elastic License (v1.2) — non-OSI but production-friendly; review legal before commercial offering.
  • Single binary — Rust-only; tweaks require Rust expertise (or YAML config + Rhai script).
  • Telemetry surface — needs OTel collector + Grafana to be useful (already in OBS).
5

Tier 5 — MCP Gateway (per-module servers)

Pick: Per-module MCP servers + federation router · 2025-11-25 spec.

Each module owns its MCP server. The gateway is a federation router, not a monolith. Streamable HTTP, OAuth-PRM, well-known discovery, tool annotations. Detailed in the Infrastructure page.

SDK choice

  • TypeScript: official @modelcontextprotocol/sdk (1.20+) — used by 19 of 22 modules
  • Rust: CyberSkill-published mcp-rs — used by BRAIN, Skill (the two Rust-first modules)
  • Tool registration — annotation-validated at startup; CI conformance test in module-template

Spec version policy

  • Current: 2025-11-25 (production-stable as of May 2026)
  • Previous: 2025-06-18 (one phase grace period after spec bump)
  • Tested clients: Claude Desktop, Claude Code, Cursor, Cline, Codex (covers ~26 AI clients)
6

Tier 6 — AI Gateway (LiteLLM router)

Pick: LiteLLM with CyberOS middleware overlay.

One gateway, one cost ledger, one residency policy. Routes between Bedrock primary (Sonnet 4.6, Haiku 4.5), Anthropic ZDR fallback, OpenAI ZDR fallback. Detailed in the Infrastructure page.

Why these providers

  • Bedrock primary — ZDR by default, regional Singapore endpoint for VN tenants, Anthropic Sonnet/Haiku available without contract overhead.
  • Anthropic ZDR fallback — direct API at Anthropic's Zero Data Retention contract tier; bypasses Bedrock when AWS has regional issues.
  • OpenAI ZDR fallback — third tier; explicitly enabled per-tenant; covers Bedrock + Anthropic dual outage.
  • No Gemini for now — no Singapore ZDR endpoint at suitable contract; revisit when GCP Singapore Gemini Enterprise ships.

Cost ceiling

  • N(FR pending) — ≤ $150/mo internal LLM
  • N(FR pending) — ≤ $4/active user/mo at 50-tenant
  • Semantic + exact-prompt cache ≥ 30% hit rate
  • Self-hosted BGE-M3 embeddings on shared GPU node (~$80/mo for GPU)
7

Tier 7 — Backend services (per-module subgraphs)

Pick: TypeScript (GraphQL Yoga) or Rust (async-graphql) per subgraph, choice owned by module owner.

Most subgraphs are TypeScript for developer ergonomics. The two performance-critical modules — BRAIN (memory writer hot path) and Skill (Wasmtime runtime, capability broker) — are Rust. Module owners pick per module; the contract (Apollo Federation SDL) is identical regardless.

TypeScript stack (default)

  • Runtime: Bun 1.2+ (faster startup, native TS/JS)
  • GraphQL server: GraphQL Yoga (urql-team-maintained, federation-aware)
  • ORM: Prisma 5 (the audit-event schema in PRD §8.7 is a Prisma model)
  • Validation: Zod + GraphQL codegen
  • Testing: Vitest + Playwright (component + e2e)

Rust stack (perf-critical)

  • Runtime: Tokio + axum
  • GraphQL server: async-graphql (federation v2 supported)
  • ORM: SQLx (compile-time-checked queries)
  • Validation: serde + msgspec for canonical-JSON ledger entries
  • Testing: cargo test + criterion benchmarks
8

Tier 8 — Data layer (Postgres + extensions)

Pick: PostgreSQL 17 + pgvector HNSW + Apache AGE 1.5 + PGroonga.

One Postgres database per region, with extensions stacked: pgvector for vector search (HNSW index), Apache AGE for graph traversal (OpenCypher dialect), PGroonga for Vietnamese-tokenised lexical search. Per-module schema isolation; RLS on every tenant-keyed table.

Why Postgres + pgvector vs separate vector DB?

  • Joins. CUO queries routinely combine a vector hit with structured filter (tenant_id, scope, classification). Vector-DB-only solutions (Pinecone, Weaviate) make this two-roundtrip.
  • Transactional. Embeddings must be written atomically with the source row. Two-system writes need 2PC; one Postgres avoids it.
  • Cost. Pinecone scales linearly with vectors; Postgres scales with hardware. At 1M chunks the cost flip favours Postgres.
  • HNSW maturity. pgvector 0.7+ HNSW index now matches Pinecone recall on MIRACL benchmarks for VN content.

RLS posture

  • Every tenant-keyed table has RLS ENABLED + FORCE ROW LEVEL SECURITY.
  • Session GUC: SET LOCAL app.tenant_id = $1 at session start.
  • Bypass: only the migration runner has BYPASSRLS; standard app role does not.
  • Audit: RLS policy violations are blocked at DB level (logged + alerted).
9

Tier 9 — Search & embeddings (self-hosted)

Pick: BAAI/bge-m3 (embedder) + BAAI/bge-reranker-v2-m3 (reranker), self-hosted on one shared GPU node.

BGE-M3 produces 1024-dim dense + sparse + multi-vector embeddings in one pass. Multilingual native (top MIRACL Vietnamese scores). The reranker re-orders top-150 hits to top-20 using cross-encoder scoring. Both run on a single shared GPU node (Hetzner CCX23 + RTX A4000 or similar; ~$80/mo).

Why self-hosted vs OpenAI text-embedding-3-large?

  • Cost. OpenAI embeddings at scale (50-tenant × 100k chunks/tenant × 30 reembeds/year) dominate the LLM bill. Self-hosted GPU is a fixed $80/mo.
  • Latency. Sub-30ms p50 vs ~150ms via OpenAI (network + queue).
  • Vietnamese quality. BGE-M3 outscores OpenAI on MIRACL-VI (Vietnamese subset) by ~8 points.
  • Residency. No data leaves CyberOS-controlled hardware. Compliance Q&A answers itself.

NFR ceilings

  • N(FR pending) — BRAIN search ≤ 250ms p95 on 1M chunks
  • Embed p95 ≤ 80ms; rerank p95 ≤ 200ms
  • End-to-end retrieve: embed + pgvector + rerank ≤ 250ms p95
10

Tier 10 — Event bus (NATS JetStream)

Pick: NATS Server 2.10+ with JetStream durable consumers.

Detailed in the Infrastructure page. Choice driven by subject-hierarchy native fit (cyberos.{tenant}.{module}.{entity}.{verb}), sub-millisecond latency, and single-binary operational footprint.

Why NATS, not Kafka / Redpanda?

See the alternatives table at §Alternatives considered for the full comparison.

11

Tier 11 — Object storage (S3-compatible)

Pick: S3-compatible — Cloudflare R2 (zero-egress) or MinIO (self-host) per environment.

DEC-005 locks the choice to S3-compatible protocol, not specific vendor. Production internal uses Cloudflare R2 (zero egress fee, global CDN). Self-hosted demo / on-prem tenant uses MinIO. Migration is a config flip.

R2 cost
$0 egress
$0.015/GB-month storage
MinIO
self-host
Apache-2; single binary
Use cases
5+
BRAIN archival, OBS logs, INV PDFs, ESOP docs, DOC signed
12

Tier 12 — Realtime sync (CRDTs)

Pick: Yjs (CHAT, collaborative docs) + Automerge (offline-first complex models).

Yjs is the production-grade CRDT lib for text + lists (rich-text CHAT messages, KB docs). Automerge owns the offline-first model surface for clients that need to edit while disconnected (Tauri desktop ⇄ web). Both speak similar BinaryDoc formats; conversion when needed.

Where they're used

  • CHAT — rich-text message editing, threaded reply collaboration (Yjs)
  • KB — collaborative document editing (Yjs + Tiptap editor)
  • PROJ — task description rich-text + checklist (Yjs)
  • Desktop offline — Tauri-based offline edits sync via Automerge when reconnected
  • BRAIN cross-tenant import — CRDT-style merge for `cyberos import` (memory module §14.2)
13

Tier 13 — Cryptography

Pick: Ed25519 signatures + scrypt key-wrap + Merkle Mountain Range (MMR) + Signed Tree Heads (STH).

BRAIN's audit ledger uses MMR for additive inclusion proofs. Each consolidation cycle signs a Tree Head with Ed25519. Signing keys are passphrase-wrapped via scrypt (P2 Stage 2). Detailed in BRAIN module page.

Primitive choice

  • Ed25519, not RSA-4096 — 32-byte keys, deterministic, FIPS 186-5 approved.
  • SHA-256, not BLAKE2 — universally available, audit-time discoverable.
  • scrypt for key wrap — memory-hard; deliberately expensive at unwrap time.
  • MMR over Merkle tree — supports unbounded appends without re-balancing; matches Certificate Transparency Log convention.

Where used

  • BRAIN audit chain — every memory operation appends a leaf; STH signed per consolidation.
  • PRD §8.7 AuditEvent — per-scope Merkle chain; prevHash chained.
  • DEC-019 — Merkle-chained audit log invariant.
  • N(FR pending) — BRAIN signed-zip portability: Ed25519 sig + Merkle proof.
14

Tier 14 — Audit ledger encoding

Pick: msgspec canonical-JSON + binlog framing (length + CRC32C + seq + ts + payload).

msgspec (Python; mirrored in Rust via custom serde) produces deterministic JSON (sorted keys, UTF-8 NFC, no insignificant whitespace) → meets RFC 8785 JCS. The binary frame header makes the ledger durable under partial-write conditions.

Format spec (Memory AGENTS.md §6.2)

# Each ledger record frame:
[u32 length BE][u32 crc32c BE][u64 seq BE][u64 ts_ns BE][payload]

# Payload = msgspec canonical JSON of:
{
  "seq": 12345,
  "ts_ns": 1715683200000000000,
  "tenant": "cyberskill",
  "actor": "member:trinh",
  "op": "put",
  "path": "memories/decisions/...",
  "body_hash": "sha256:abc...",
  "prev_chain": "sha256:xyz...",
  "chain": "sha256:def..."   # SHA-256(canonical(record_minus_chain) || prev_chain)
}
15

Tier 15 — Compliance tooling

Pick: OPA (Open Policy Agent) + Conftest + Trust Center (static site).

OPA enforces Rego policies across Kubernetes manifests, GraphQL operation directives, and IAM transitions. Conftest runs OPA in CI for declarative-file validation. Trust Center is a static site (Astro + MDX) hosting VPAT, SOC 2, ISO 27001, CSA STAR docs.

What OPA enforces

  • K8s admission — block deploys missing tenant labels, resource limits, network policies
  • GraphQL directives — every @sensitive field must have classification
  • IAM transitions — Founder role grants require dual-approval workflow
  • MCP tool annotations — destructive tool registration validation

Trust Center stack

  • Astro — static site generator with MDX support
  • Auth — NDA click-wrap before downloading reports (AUTH module integration)
  • Signed URLs — 24-hour TTL via R2 presigned URLs
  • Audit — every download logged for N(FR pending) compliance
16

Tier 16 — Typography & design tokens

Pick: Be Vietnam Pro (UI) + JetBrains Mono (code) + CyberSkill Global Design System v1.0.0.

Be Vietnam Pro is the diacritic-aware Vietnamese-first typeface; the Design System Part 5 specifies stack-fidelity (N(FR pending)). Tokens are exported in W3C DTCG format (2025.10) for cross-platform consumption (Style Dictionary, Tailwind via PostCSS plugin, iOS/Android, Figma).

Token surface

  • Anchors — Umber #45210E, Ochre #F4BA17, sub-brand accents
  • Typography — Be Vietnam Pro (UI), JetBrains Mono (code)
  • Spacing rhythm — 4px base; powers-of-two scale
  • Genie token set — dedicated panel/chip/mode-indicator tokens; versioned alongside CUO persona
17

"What calls what" — dependency graph

The tiers compose left-to-right: every request from a user or agent flows through this graph. Cycles are forbidden by design.

One request traverses every tier — sequence view
sequenceDiagram autonumber actor U as User participant T2 as T2 Host shell participant T3 as T3 Module remote participant T4 as T4 Apollo Router participant T7 as T7 Subgraph (Bun/Tokio) participant T8 as T8 Postgres + ext participant T6 as T6 AI Gateway participant T9 as T9 BGE-M3 (GPU) participant T10 as T10 NATS participant T11 as T11 R2 participant T14 as T14 Audit ledger U->>T2: open route T2->>T3: lazy-load remote T3->>T4: persisted query hash T4->>T7: federated query plan T7->>T8: SELECT (RLS-scoped) T8-->>T7: rows T7->>T6: POST /v1/embeddings T6->>T9: BGE-M3 self-hosted T9-->>T6: vector T6-->>T7: embed T7->>T8: SELECT pgvector T8-->>T7: hits T7->>T10: publish event T7->>T11: write attachment (if any) T7->>T14: append audit row (msgspec canonical) T7-->>T4: response T4-->>T3: composed result T3-->>T2: render T2-->>U: paint
flowchart LR USER[("User · Agent")] --> HOST["Host shell
Vite + React 19"] HOST --> REMOTE["Module remote
Webpack 5 + MF"] REMOTE --> APOLLO["Apollo Router"] APOLLO --> AUTH["AUTH JWKS"] APOLLO --> SUBG["Subgraph (TS/Rust)"] SUBG --> PG[("Postgres 17 + ext.")] SUBG --> AIGW["AI Gateway"] SUBG --> NATS_DEP[("NATS JetStream")] SUBG --> S3_DEP[("R2 / MinIO")] AIGW --> LL["LiteLLM"] LL --> BEDROCK["AWS Bedrock"] LL --> ANT["Anthropic ZDR"] LL --> OAI["OpenAI ZDR"] LL --> BGE["BGE-M3 (self-hosted GPU)"] USER --> MCPCLT[("MCP client
Claude / Cursor")] MCPCLT --> MCPGW["MCP Gateway"] MCPGW --> AUTH MCPGW --> SUBG SUBG -. trace .-> OBS["OBS · OTel"] APOLLO -. trace .-> OBS AIGW -. trace .-> OBS MCPGW -. trace .-> OBS classDef u fill:#fef6e0,stroke:#9c750a classDef fe fill:#e8d4c2,stroke:#45210e classDef gw fill:#f9c64f,stroke:#9c750a classDef be fill:#f5ede6,stroke:#45210e classDef data fill:#cba88a,stroke:#45210e classDef ext fill:#fde7b3,stroke:#9c750a class USER,MCPCLT u class HOST,REMOTE fe class APOLLO,MCPGW,AIGW,AUTH gw class SUBG be class PG,NATS_DEP,S3_DEP data class LL,BEDROCK,ANT,OAI,BGE,OBS ext
18

Cost-vs-tier model

Two reference scales — 10 Members internal (P0–P2) and 50 tenants (P4 GA). Each tier's contribution maps to a hard NFR ceiling.

Cost flow — where the dollar goes at internal scale
flowchart LR BUDGET[("$535/mo
N(FR pending) envelope")] --> LLM["28% · LLM
$150 · primarily Sonnet + Haiku via Bedrock"] BUDGET --> COMPUTE["17% · K8s compute
$90 · 22 subgraphs + gateways"] BUDGET --> PG["15% · Postgres
$80 · primary + read replica"] BUDGET --> OBS_C["15% · OBS (LGTM)
$80 · Loki + Tempo + Mimir + Grafana"] BUDGET --> GPU["15% · GPU embed
$80 · shared BGE-M3 node"] BUDGET --> STORE["5% · object storage
$25 · R2 zero-egress"] BUDGET --> NATS_C["4% · NATS
$20 · single-node JetStream"] BUDGET --> EDGE["1% · CDN + auth
$10"] classDef envelope fill:#fef6e0,stroke:#9c750a,stroke-width:2px classDef llm fill:#f9c64f,stroke:#9c750a classDef compute fill:#f5ede6,stroke:#45210e classDef data fill:#e8d4c2,stroke:#45210e classDef obs fill:#fde7b3,stroke:#9c750a classDef ext fill:#cba88a,stroke:#45210e class BUDGET envelope class LLM,GPU llm class COMPUTE compute class PG,STORE data class OBS_C obs class NATS_C,EDGE ext
xychart-beta title "Monthly cost per tier ($USD)" x-axis ["LLM (T1+T6)", "Postgres (T8)", "Compute (T7)", "Storage (T11)", "OBS (LGTM)", "NATS (T10)", "AUTH (T4-side)", "Embeddings GPU (T9)"] y-axis "USD per month" 0 --> 600 bar [150, 80, 90, 25, 80, 20, 5, 80]
Internal scale (10 Members) — total ≤ $530/mo against N(FR pending) budget of $530/mo ($150 LLM + $380 infra)
xychart-beta title "Cost shape at 50-tenant scale ($USD/month)" x-axis ["LLM", "Postgres (3 regions)", "Compute (k8s)", "Storage", "OBS", "NATS (cluster)", "AUTH", "GPU embed"] y-axis "USD per month" 0 --> 1400 bar [800, 600, 500, 200, 200, 100, 50, 200]
50-tenant scale — total ≤ $2,650/mo against N(FR pending) budget of $2,200 + $4/user/mo LLM

Per-tier production cost (M+6, M+18 projections)

TierPickInternal (10 Members)50-tenant scaleMigration door
T1 Persona / AgentLangGraph + LiteLLM$0 host$0 hostReplace LangGraph supervisor
T2 Host shellVite + React 19 + Tauri$5/mo CDN$50/mo CDNSwitch host to Next.js
T3 Module remotesWebpack 5 + MFincludedincludedPin MF v2 spec
T4 Apollo RouterApollo Router$0 (OSS binary)$50/mo VM clusterElastic License v1.2 review
T5 MCP GatewayCustom router + per-module servers$0 (in-cluster)$30/moMCP spec preserves portability
T6 AI GatewayLiteLLM + Bedrock primary$150/mo$800/mo (+ per-user)Provider mix via config
T7 BackendBun / Tokio · 22 subgraphs$90/mo k8s$500/mo k8sContainers, portable
T8 DataPostgres 17 + pgvector + AGE + PGroonga$80/mo$600/mo (3 regions)SQL portable
T9 EmbeddingsBGE-M3 + reranker (GPU)$80/mo$200/mo (multi-GPU)Switch to OpenAI text-embed
T10 Event busNATS JetStream$20/mo VM$100/mo clusterNATS subjects → Kafka topics
T11 Object storageR2 / MinIO$25/mo$200/moS3-compatible config flip
T12 CRDT syncYjs / Automerge$0 (libs)$0 (libs)Doc-format-portable
T13–14 Cryptography + LedgerEd25519 + MMR + msgspec$0$0Schema-portable
T15 ComplianceOPA + Trust Center$5/mo static host$20/moOPA Rego portable
OBS (LGTM)Grafana / Loki / Tempo / Mimir$80/mo$200/moOTel-native; switch backend
Total≤ $535/mo~$2,750/mo
19

Alternatives considered (per major pick)

Five major architectural picks deserve an explicit alternatives table. Each rejected option has a documented rejection rationale and a "would reconsider when..." trigger.

Postgres + pgvector — vs separate vector DB

OptionProsConsStatus
Postgres + pgvector HNSWOne DB; transactional embed-writes; structural joins; cheap; VN-tokenisation via PGroongaOperational complexity (more extensions); ~10% slower than dedicated vector DB at very large scaleSELECTED
PineconeBest-in-class recall; managed; horizontal scalingVendor lock; egress fees; no Singapore region; not transactional w/ source-of-truthRejected — sovereignty + cost
WeaviateOSS; multi-modal; GraphQL nativeMemory-heavy; Janus runtime; embedded-mode not prod-grade for 1M+ chunksRejected — operational cost
QdrantRust-native; fast; OSSTwo-system writes still required; smaller communityReconsider if pgvector p95 fails N(FR pending)
Milvus / ZillizScales to billions; cloud-nativeK8s-heavy ops; same two-system issueOut of scope for 10–50 tenant scale

Apollo Federation v2 — vs REST / gRPC / single GraphQL

OptionProsConsStatus
Apollo Federation v2.5+Per-module subgraph ownership; single agent surface; persisted query budget; query plan cacheApollo Router Elastic License (non-OSI); Rust expertise to extendSELECTED
REST per moduleUniversal; cacheableN round-trips per page; no agent-friendly introspection; N OpenAPIs to maintainRejected — agent ergonomics
gRPC + ConnectRPCStrongly typed; fast; protobuf schemaBrowser story still weak; no native cross-subgraph composition; agents don't speak gRPC nativelyRejected — frontend friction
Single monolithic GraphQLSingle schemaMerge conflicts every PR; team coupling; deploy-couplingRejected — team scale
tRPCExcellent DX; TS-end-to-endTS-only; no agent-facing surface; per-module schemas don't composeRejected — language lock

LangGraph — vs DSPy / native LangChain / Semantic Kernel

OptionProsConsStatus
LangGraphStateGraph native; interrupt() HITL; checkpointer for resumability; LangSmith tracingPython-only; ties to LangChain ecosystemSELECTED
DSPyOptimisation-first; auto-promptingNot a router framework; long-lived agent loop awkwardReconsider for batch evals only
Native LangChain agentsMature; vast tool catalogImperative loop; hard to audit; HITL via callbacks is brittleRejected — auditability
Semantic KernelC# / Python native; Microsoft-backedSmaller community; MS ecosystem biasRejected — community size
CrewAIMulti-agent ergonomicsLess mature; HITL gates not first-classWatching

NATS JetStream — vs Kafka / Redpanda

OptionProsConsStatus
NATS JetStreamSubject hierarchy native; sub-ms latency; single 50MB binary; runs on $20/mo VM at internal scaleSmaller community than Kafka; less tooling around DLQ replaySELECTED
Apache KafkaIndustry standard; massive tooling; Confluent SaaS availableJVM ops complexity; flat topics; Zookeeper/Kraft cluster mandatory; expensive at 10-Member scaleRejected — footprint
RedpandaKafka-protocol-compatible; Rust; lower ops costSame flat-topic model; less mature than KafkaReconsider if Kafka-tooling needed
AWS SQS / EventBridgeManaged; pay-per-messageVendor lock; no subject hierarchy; latency 30-100ms typicalRejected — sovereignty
Apache PulsarMulti-tenant native; geo-replicationOperational complexity (BookKeeper); overkill at our scaleOut of scope

Tauri — vs Electron / Wails / Native

OptionProsConsStatus
Tauri 23-10MB bundle (vs Electron's 100+); Rust backend; OS webview; passes Apple notarisation by defaultOS-native webview means CSS testing matrix; Rust IPC layer to learnSELECTED
ElectronMature; Chrome consistency; vast plugin ecosystem100+ MB bundle; memory-hungry; security-update treadmillRejected — bundle size
Wails (Go backend)Go single-binary feel; webview-basedSmaller community; v3 still maturingReconsider
Native (Swift / WinUI / GTK)Best UX; smallest bundles3× the code; can't reuse React componentsRejected — scope
Web-only (PWA)No native shipNo file-system access; no native notification UXReconsider at P3 mobile evaluation