KB is CyberOS's documentation surface and the canonical source for AI-grounded retrieval. The data model is simple: a Document has a slug, a markdown body, YAML frontmatter, a category, a permission tier, and a chain of Versions. Every save produces a new immutable version with a chained audit row in BRAIN. The renderer produces sanitised HTML (server-side) for human reading and a clean plaintext stream for BRAIN ingestion. Search is a three-layer stack: FTS5 / PGroonga (Vietnamese bigram tokenisation) for lexical, BGE-M3 embeddings (BRAIN Layer 2) for semantic, and BGE-rerank-v2-m3 cross-encoder for re-ranking. "Ask this page" produces an answer grounded only in the current doc + explicitly linked docs, with span-level citations. Permissions: public · org-only · role-restricted (with share-link tokens for time-bound external access). Dual-language: a doc has a language field and an optional translation_of link to its counterpart.
Why KB exists
Three jobs in one module: (a) the docs surface a team needs for how-tos, runbooks, decision logs, and policies; (b) the canonical source for grounded AI retrieval — if you do not control where the AI answer comes from, you do not control what it says; (c) a per-doc permissioned publishing layer for Trust Center pages and client-shared documents. Off-the-shelf wikis (Notion, Confluence) handle (a) but treat (b) and (c) as afterthoughts. CyberOS treats (b) as the central design force: every KB document is a first-class BRAIN Layer 2 citation source, every Q&A answer cites span-level back to KB, and every "promote to canonical" elevates a doc to a high-authority BRAIN source. That property is only credible when KB owns its versioning, ACL, and ingestion path.
Every AI answer grounded in KB carries span-level citations. Hover the citation → see the exact source paragraph. Bad answers are auditable.
vi and en docs are linked via translation_of; reader sees the language matching their JWT locale; the AI grounds in both when relevant.
A user asking BRAIN a question only gets citations from KB docs they are allowed to read. ACL is enforced at retrieval time, not after.
The bet is that the docs surface and the AI retrieval surface are the same surface. The cost is that KB is more constrained than a free-form wiki — every doc has a category, every save is versioned, every ingestion respects ACL. The benefit is that "ask the KB" is a credible AI feature, not a vibes-shaped hallucination machine, because every answer cites a specific span of a specific doc.
What it does — 5W1H2C5M
A structured decomposition of KB's scope. Every cell traces back to PRD §9.12 and §19.7.
| Axis | Question | Answer |
|---|---|---|
| 5W · What | What is KB? | A markdown-source, server-rendered HTML, versioned, ACL'd documentation system that ingests into BRAIN Layer 2 for AI-grounded retrieval. Three-layer search (FTS5 + semantic + reranker). "Ask this page" with span-level citations. Dual-language vi + en. |
| 5W · Who | Who uses it? | Members: read + write docs daily. CDO seat: owns the surface; reviews "promote to canonical" requests. Members in a category role: can edit docs in their category. Trust Center readers: public-readable docs for opted-in tenants. Agents: KB is the primary grounded-retrieval target. |
| 5W · When | When does it run? | Continuous: SPA editor + reader. On every save: render → BRAIN ingest p95 ≤ 5 s. Nightly: dead-link detection; semantic index refresh on changed embeddings. |
| 5W · Where | Where does it run? | P1: single region (SG-1) with VN-residency RDS. P3+: multi-region read replicas. Source markdown is RDS + S3 (S3 for attachments and large binaries). |
| 5W · Why | Why a separate module? | Off-the-shelf wikis do not treat AI-grounded retrieval as a first-class concern; folding KB into BRAIN corrupts the BRAIN ingestion ledger; folding it into PROJ ties it to engagement-scoped lifetimes. Standalone module with tight BRAIN integration is the right shape. |
| 1H · How | How does it work? | Editor writes markdown + frontmatter; on save, server validates, renders HTML (sanitised), computes diff vs prior version, creates a Version row, queues BRAIN ingest. Search: FTS5 / PGroonga produces top-100 lexical, BGE-M3 reranks, BGE-rerank-v2-m3 picks top-10. Q&A: pull top spans, format prompt, call AI Gateway with citation-required system instruction. |
| 2C · Cost | Cost budget? | P1: ~$55 / month single-tenant pilot (Fargate + RDS + Redis + S3). Embedding cost ~$0.0001 / doc-version; reranker ~$0.0005 / query. 50-tenant: ~$220 / month. |
| 2C · Constraints | Constraints? | (a) BRAIN ingest p95 ≤ 5 s ((FR pending)). (b) Q&A must cite ((FR pending)). (c) Permissions enforced at retrieval time — ACL leak via Q&A is a sev-0 bug. (d) Trust Center pages are public-readable only when explicitly opted in ((FR pending)). (e) Vietnamese-quality search ≥ 90% recall on a fixed evaluation corpus. |
| 5M · Materials | Stack? | Rust 1.81 · axum · sqlx · PostgreSQL 16 + PGroonga · pulldown-cmark for markdown · ammonia for HTML sanitisation · Redis 7 · S3 + KMS · BGE-M3 embedder + BGE-rerank-v2-m3 reranker (BRAIN-shared) · TipTap or CodeMirror for the editor · OpenTelemetry SDK. |
| 5M · Methods | Method choices? | Markdown source of truth (not block-based proprietary). Immutable versioning (no in-place edit). Server-side render (no client-side trust). FTS5 / PGroonga + semantic + reranker triple-layer (not just one). ACL at retrieval time (not at display time). |
| 5M · Machines | Deployment? | Fargate axum service. RDS Postgres Multi-AZ. PGroonga compiled into the RDS image. Redis hot cache. Embedding + reranker GPU node shared with BRAIN. |
| 5M · Manpower | Who maintains? | 0.4 FTE (CDO seat) at P1 launch + 0.1 FTE (CCO for Trust Center pages). CTO owns the engine. |
| 5M · Measurement | How measured? | Search p95 ≤ 350 ms, Q&A p95 ≤ 4 s end-to-end, citation accuracy ≥ 95% (claim → source span), BRAIN ingest lag p95 ≤ 5 s, Vietnamese-query recall ≥ 90%. |
Architecture
KB is one axum service. Four surfaces (GraphQL subgraph, REST admin, public-readable HTML for Trust Center pages, MCP tool catalogue). Three stores (PostgreSQL canonical + PGroonga + FTS5, Redis hot cache, S3 for attachments). The renderer and the BRAIN ingester are separate concerns: the renderer produces HTML for humans; the BRAIN ingester produces a sanitised plaintext + chunking stream for vectorisation.
pulldown-cmark + ammonia"] DIFF["Diff engine"] ACL["ACL gate
public · org · role"] BRAIN_ING["BRAIN ingester
chunk + sanitise"] SEARCH["Search engine
FTS5 + semantic + rerank"] QA["Q&A grounded composer"] XLATE["Translation linker"] BACKLINK["Backlink computer"] end subgraph EMBED ["BRAIN-shared"] EMB["BGE-M3 embedder"] RANK["BGE-rerank-v2-m3 reranker"] end subgraph STORES ["Stores"] PG[("PostgreSQL + PGroonga
document · version · category
RLS by tenant_id")] RED[("Redis 7
rendered HTML cache · search cache")] S3[("S3 + KMS
attachments")] end subgraph SINKS ["Sinks"] BRAIN["🧠 BRAIN
Layer 2 ingestion · audit"] AI["⚡ AI Gateway"] OBS["👁 OBS"] end SPA --> GQL SPA --> REST PUB --> HTMLPUB AGENT --> MCP GQL --> DOC REST --> DOC REST --> RENDER REST --> SEARCH REST --> QA MCP --> SEARCH MCP --> QA HTMLPUB --> ACL DOC --> VER DOC --> RENDER DOC --> DIFF DOC --> BRAIN_ING DOC --> ACL BRAIN_ING --> BRAIN BRAIN_ING --> EMB SEARCH --> EMB SEARCH --> RANK QA --> AI QA --> SEARCH DOC --> XLATE DOC --> BACKLINK DOC --> PG RENDER --> RED DOC --> S3 DOC --> OBS classDef planned fill:#fef6e0,stroke:#92400e classDef store fill:#f5f3ff,stroke:#7c3aed classDef sink fill:#f5ede6,stroke:#45210e class SPA,PUB,AGENT,GQL,REST,HTMLPUB,MCP,DOC,VER,RENDER,DIFF,ACL,BRAIN_ING,SEARCH,QA,XLATE,BACKLINK,EMB,RANK planned class PG,RED,S3 store class BRAIN,AI,OBS sink
Document categories (closed)
Step-by-step instructions: "How to file a leave request", "How to onboard a Member".
Stable facts: API reference, role catalogue, rate cards, compliance citations.
Why we did X. Mirrors a BRAIN memories/decisions/ entry but human-readable.
Company policy: leave, compensation, security, code of conduct.
Incident-response playbooks: AUTH key compromise, EMAIL Stalwart CVE, payroll outage.
Public-readable on opt-in tenants. DPA, sub-processor list, security overview, DMARC status.
Internal components
| Component | Path (planned) | Responsibility |
|---|---|---|
document.rs | services/kb/src/document.rs | Document CRUD. Slug uniqueness per tenant. Frontmatter validation (kind, category, language, permission tier). |
version.rs | services/kb/src/version.rs | Version archiver. Every save → new immutable row. Retains markdown + rendered HTML hash. |
renderer.rs | services/kb/src/renderer.rs | Server-side markdown → HTML. Uses pulldown-cmark + ammonia (sanitise). No client-side JS execution. |
diff.rs | services/kb/src/diff.rs | Unified diff between versions. Powers the version-history UI. |
brain_ingest.rs | services/kb/src/brain_ingest.rs | On every version save: strip markdown, chunk at semantic boundaries, write BRAIN Layer 2 rows + embeddings. p95 ≤ 5 s ((FR pending)). |
search.rs | services/kb/src/search.rs | Triple-layer search. FTS5 / PGroonga → top-100; BGE-M3 cosine top-30; BGE-rerank-v2-m3 → top-10. |
qa.rs | services/kb/src/qa.rs | Q&A composer. Pull top spans → format prompt with citation-required system instruction → call AI Gateway → parse cited answer. |
acl.rs | services/kb/src/acl.rs | ACL gate at retrieval time. Filters spans before they reach the QA composer. |
share_link.rs | services/kb/src/share_link.rs | Time-bound share-link tokens for external readers ((FR pending)). |
translation.rs | services/kb/src/translation.rs | Translation-of linkage. Reader sees doc in JWT-locale; AI grounds across language pairs. |
backlink.rs | services/kb/src/backlink.rs | Backlink graph: "what links here" query. |
promote.rs | services/kb/src/promote.rs | "Promote to canonical" — elevates a doc to a high-authority BRAIN source ((FR pending)). Requires CDO approval. |
notion_import.rs | services/kb/src/notion_import.rs | Notion-export ZIP import ((FR pending)). Preserves links + categories. |
export.rs | services/kb/src/export.rs | Per-page or per-tree markdown export. |
trust_center.rs | services/kb/src/trust_center.rs | Public-readable Trust Center pages. Opt-in per tenant ((FR pending)). |
migrations/ | services/kb/migrations/ | sqlx migrations + PGroonga index DDL. RLS on every table. |
Data model
Documents have a slug, current version pointer, category, permission tier, language, and optional translation_of link. Versions are immutable; the document row's current_version_id points at the latest. Permissions cascade by category (tenant default) and can be tightened per doc.
Permission tiers
| Tier | Visible to | Used for |
|---|---|---|
public | Anyone with the URL (incl. anonymous if trust_center_published) | Trust Center pages, public marketing docs. |
org | Any authenticated subject in the tenant | How-to, reference, decision-log (default). |
role-restricted | Subjects holding one of allowed_role_codes | Policy (HR), runbook (CSO), compensation references. |
explicit | Subjects in PERMISSION table | Per-doc carve-outs (e.g. specific Member can read role-restricted). |
share-link | Anyone with the token, until expiry | Time-bound external sharing (client review of a doc). |
API surface
Four surfaces: a federated GraphQL subgraph; a REST surface for editor + reader (with rendered HTML caching at the edge); a public HTML endpoint for Trust Center pages; and an MCP tool catalogue (search + ask) for CUO.
GraphQL subgraph
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.5", import: ["@key", "@requiresScopes"])
type Document @key(fields: "id") {
id: ID!
slug: String!
title: String!
category: Category!
permission: PermissionTier!
language: Language!
translationOf: Document
translations: [Document!]!
currentVersion: Version!
versions(limit: Int = 20): [Version!]!
renderedHtml: String!
tags: [String!]!
attachments: [Attachment!]!
backlinks: [Document!]!
trustCenterPublished: Boolean!
brainIngestState: BrainIngestState
promotion: Promotion
}
type Version @key(fields: "id") {
id: ID!
documentId: ID!
versionNum: Int!
savedAt: DateTime!
savedBy: Subject!
changeSummary: String
wordCount: Int!
diffFrom(version: ID!): String!
}
type SearchResult {
document: Document!
snippet: String!
score: Float!
}
type QAResult {
question: String!
answer: String!
citations: [Citation!]!
confidence: Float!
}
type Citation {
documentId: ID!
documentTitle: String!
versionId: ID!
charStart: Int!
charEnd: Int!
snippet: String!
}
enum Category { HOW_TO REFERENCE DECISION_LOG POLICY RUNBOOK TRUST_CENTER }
enum PermissionTier { PUBLIC ORG ROLE_RESTRICTED EXPLICIT SHARE_LINK }
enum Language { VI EN }
type Query {
document(id: ID, slug: String): Document
searchDocuments(query: String!, category: Category, limit: Int = 10): [SearchResult!]!
askPage(documentId: ID!, question: String!): QAResult!
askKb(question: String!, scope: AskKbScope): QAResult! @requiresScopes(scopes: [["kb.ask"]])
}
type Mutation {
createDocument(input: CreateDocumentInput!): Document!
@requiresScopes(scopes: [["kb.write"]])
saveDocument(id: ID!, markdown: String!, changeSummary: String!): Version!
setPermission(id: ID!, tier: PermissionTier!, allowedRoleCodes: [String!]): Document!
@requiresScopes(scopes: [["kb.permission"]])
promoteToCanonical(id: ID!, brainCanonicalPath: String!): Promotion!
@requiresScopes(scopes: [["kb.promote"]])
createShareLink(id: ID!, validUntil: DateTime!, maxViews: Int): ShareLinkResult!
importNotionZip(zipS3Key: String!): NotionImportJob!
@requiresScopes(scopes: [["kb.import"]])
}
REST surface
| Method | Path | Purpose |
|---|---|---|
| GET | /kb/{slug} | Render document as HTML (ACL-gated). |
| GET | /kb/{slug}.md | Markdown source download. |
| POST | /kb/{slug}/save | Save markdown + frontmatter. |
| GET | /kb/{slug}/versions/{n} | Render a specific version. |
| GET | /kb/{slug}/diff?from={a}&to={b} | Unified diff. |
| GET | /kb/search?q=…&cat=… | Triple-layer search. |
| POST | /kb/ask-page | Q&A grounded in this page + linked pages. |
| POST | /kb/ask | Q&A across whole KB (ACL-filtered). |
| GET | /trust-center/{slug} | Public read (opted-in tenants). |
| GET | /share/{token} | Share-link access. |
| POST | /admin/import/notion | Notion ZIP import. |
| POST | /admin/export/tree?root=… | Per-tree markdown export. |
MCP tool catalogue
| Tool name | Inputs | Outputs | Annotations |
|---|---|---|---|
cyberos.kb.search | query, category?, limit | SearchResult[] | readonly · scope=kb.read |
cyberos.kb.get_document | slug | Document | readonly · scope=kb.read |
cyberos.kb.ask_page | document_id, question | QAResult | readonly · scope=kb.read |
cyberos.kb.ask | question, scope? | QAResult | readonly · scope=kb.ask |
cyberos.kb.list_versions | document_id | Version[] | readonly · scope=kb.read |
cyberos.kb.diff | document_id, from, to | diff text | readonly · scope=kb.read |
cyberos.kb.save | document_id, markdown, change_summary | Version | scope=kb.write |
cyberos.kb.create_share_link | document_id, valid_until, max_views? | {token, url} | scope=kb.share |
cyberos.kb.promote | document_id, brain_canonical_path | Promotion | destructive · human-confirm · scope=kb.promote |
Key flows
Flow 1 — Create / edit a doc with BRAIN re-ingest
Flow 2 — Triple-layer search
ACL is applied before reranking, never after. A doc the user cannot read never reaches the reranker, the QA composer, or the UI.
Flow 3 — "Ask this page" with citations
Flow 4 — Promote to canonical
(Layer-1 canonical, not just Layer-2 chunk)
Flow 5 — Notion import
Document lifecycle
A document's status is implicit (it always has a current version). Versions are immutable; archive / restore moves the current_version_id pointer. Promotion is a one-way state transition that registers the doc as a BRAIN canonical source.
Version retention
| Category | Retention | Notes |
|---|---|---|
policy | 10 years | Required by Vietnamese Decree 13 / labour law. |
runbook | 5 years | Incident-response audit support. |
decision-log | indefinite | Mirrors BRAIN decisions retention. |
reference | indefinite | Foundational facts. |
how-to | 2 years | How-tos drift; old versions archived. |
trust-center | indefinite | External commitments — provenance retained. |
Functional Requirements
The CyberOS FR catalogue is being rebuilt one feature at a time via the open fr-author Agent Skill.
Previous FR enumerations were archived 2026-05-14 and are no longer reflected on this page. PRD/SRS narrative remains authoritative for the spec; specific FRs land here as they are re-authored.
Non-Functional Requirements
NFRs from PRD §11.2 that KB must satisfy.
| NFR ID | Concern | Target | Measurement |
|---|---|---|---|
N(FR pending) | Search p95 | ≤ 350 ms | OBS histogram |
N(FR pending) | Q&A p95 (end-to-end) | ≤ 4 s | OBS + AI Gateway |
N(FR pending) | BRAIN ingest p95 | ≤ 5 s ((FR pending)) | BI histogram |
N(FR pending) | Document render p95 (cache-cold) | ≤ 250 ms | OBS histogram |
N(FR pending) | Vietnamese-query recall (eval corpus) | ≥ 90% | quarterly review |
N(FR pending) | Citation accuracy | ≥ 95% | monthly human review of 50 Q&A pairs |
N(FR pending) | Q&A "I don't know" rate on out-of-corpus queries | ≥ 90% | red-team eval |
N(FR pending) | ACL leak via search / Q&A | = 0 | CI test on every PR |
N(FR pending) | HTML rendering XSS | = 0 | ammonia sanitisation + CSP |
N(FR pending) | Service availability | ≥ 99.9% (28-day) | OBS SLO |
N(FR pending) | Version durability | 0 lost saves under crash | chaos test |
N(FR pending) | Policy / runbook retention (10 / 5 years) | 100% | retention policy enforcement |
Dependencies
KB depends on AUTH (RBAC + ACL), BRAIN (Layer 2 ingestion target + audit), AI Gateway (Q&A composer), MCP (CUO tools), and OBS. It is depended on by CUO (grounded answers), Trust Center readers, and downstream agents that ask KB questions.
RBAC + ACL"] BRAIN["🧠 BRAIN
Layer 2 + audit"] AI["⚡ AI Gateway
Q&A composer"] EMB["BGE-M3 + reranker
(BRAIN-shared)"] MCP["🔌 MCP"] OBS["👁 OBS"] end KB["📚 KB"] subgraph downstream ["KB is depended on by"] CUO["🎯 CUO
grounded retrieval"] PORTAL["Portal · P2
(client KB views)"] EMAIL["✉️ EMAIL
digests"] CHAT["💬 CHAT
link previews"] end AUTH --> KB BRAIN --> KB AI --> KB EMB --> KB MCP --> KB OBS --> KB KB --> CUO KB --> PORTAL KB --> EMAIL KB --> CHAT classDef shipped fill:#f5ede6,stroke:#45210e classDef planned fill:#fef6e0,stroke:#9c750a class BRAIN,EMB shipped class KB,AUTH,AI,MCP,OBS,CUO,PORTAL,EMAIL,CHAT planned
Compliance scope
KB holds policy and decision-log documents that are themselves compliance artefacts; it must satisfy retention, residency, and access-audit obligations.
| Regulation / standard | Article / clause | KB feature that satisfies it |
|---|---|---|
| Vietnam PDPL (Law 91/2025) | Art. 14 — DSAR | DSAR export of every doc a subject authored or edited. |
| Vietnam Decree 13/2023 | Art. 17 — Processing log | Every save / view writes a BRAIN audit row. |
| Vietnam Decree 53/2022 | Art. 26 — Residency | VN-tenant docs on hanoi-1 RDS + S3. |
| GDPR (EU 2016/679) | Art. 15 — Right of access | DSAR export. |
| GDPR | Art. 17 — Right to erasure | Document purge with audit row; KB Layer 2 chunk removal cascades to BRAIN. |
| ISO/IEC 27001:2022 | A.5.10 — Acceptable use | Policy docs live in KB; acceptance audit via read receipts. |
| ISO/IEC 27001:2022 | A.8.5 — Secure authentication | ACL-gated retrieval; share-link tokens time-bound. |
| SOC 2 Type II | CC2.2 — Internal communication | KB is the canonical doc surface for policy + runbook. |
| SOC 2 Type II | CC6.1 — Logical access | RBAC + per-doc ACL; ACL applied at retrieval. |
| OWASP Top-10 (web) | A03 — Injection (XSS) | ammonia HTML sanitisation; CSP headers. |
Risk entries
KB-specific risks tracked in the risk register.
| ID | Risk | Likelihood | Impact | Owner | Mitigation |
|---|---|---|---|---|---|
R-KB-001 | ACL leak via search / Q&A surface | Low | High | CSO | ACL applied before reranking + LLM; CI test asserts a restricted doc cannot surface. |
R-KB-002 | Q&A hallucinates a citation that does not match the cited span | Medium | Medium | CDO | QA composer validates every cited span_id; mismatched citations rejected; "I don't know" returned. |
R-KB-003 | BRAIN ingest backlog blinds retrieval after major doc rewrite | Medium | Medium | CDO | p95 ≤ 5 s SLO; backlog alarm at > 60 s pages CDO. |
R-KB-004 | Vietnamese tokenisation regression on PGroonga upgrade | Low | Medium | CTO | 50-query VN eval corpus run on every PGroonga upgrade. |
R-KB-005 | XSS via markdown embedding raw HTML | Low | High | CSO | ammonia sanitiser + strict CSP; fuzz tests on every PR. |
R-KB-006 | Notion import truncates large pages | Medium | Low | CTO | Page-size guard; rejected pages reported in import summary. |
R-KB-007 | Share-link token replay after expiry | Low | Medium | CSO | Token expiry enforced server-side; revocation propagates to Redis cache within 30 s. |
R-KB-008 | Promoted-to-canonical doc later modified, BRAIN canonical out of sync | Medium | Medium | CDO | Demotion required before edit on promoted doc; or auto re-promotion with audit row. |
R-KB-009 | Translation drift between vi and en versions | Medium | Low | CCO | Drift detection (word-count + key-phrase diff); flagged in editor UI. |
R-KB-010 | Public Trust Center page reveals private policy by misconfiguration | Low | High | CCO | Trust-center publish requires double-confirm + audit row; CI test asserts no role-restricted docs published. |
KPIs
KB rolls up 9 KPIs covering search quality, Q&A grounding, ingestion latency, and editorial health.
| KPI | Formula | Source | Target |
|---|---|---|---|
| Search p95 | histogram | OBS | ≤ 350 ms |
| Q&A p95 | histogram | OBS + AI Gateway | ≤ 4 s |
| Citation accuracy | matching_spans / claims | monthly human review | ≥ 95% |
| BRAIN ingest p95 | histogram | BI | ≤ 5 s |
| VN-query recall | relevant_returned / relevant_total | quarterly eval | ≥ 90% |
| "I don't know" rate (out-of-corpus) | idk / total_questions | red-team | ≥ 90% |
| Docs per Member | active_docs / members | BRAIN audit | tracked; baseline 8 |
| Stale-doc rate | > 180 d untouched / total | BRAIN | ≤ 25% |
| ACL-leak incidents | count | BRAIN audit | = 0 |
RACI matrix
KB is owned by CDO seat (interim CEO).
| Activity | CEO | CDO | CTO | CSO | CCO | CHRO |
|---|---|---|---|---|---|---|
| Service design + spec | A | R | C | C | C | I |
| Implementation | I | C | A/R | C | I | I |
| Promote-to-canonical approval | C | A/R | I | I | I | I |
| Trust Center publication | C | C | I | C | A/R | I |
| Policy doc authorship | C | C | I | C | I | A/R |
| ACL audit | C | C | R | A | I | I |
| Vietnamese-quality review | I | A/R | C | I | C | I |
| Notion / external import | I | A/R | C | I | I | I |
| DSAR fulfilment | I | C | C | R | I | I |
R Responsible · A Accountable · C Consulted · I Informed.
Planned CLI surface
cyberos-kb for tenant operators, bulk import / export, and CDO promotion review.
1. Create a doc from markdown
$ cyberos-kb create \
--slug how-to-file-leave \
--category how-to \
--language vi \
--file ./leave.md
[create] doc id: 01HZL1…
[render] markdown → HTML (sanitised)
[brain] enqueued ingest job
[audit] brain seq=15401 chain=…
2. Link a translation
$ cyberos-kb link-translation \
--vi how-to-file-leave \
--en how-to-file-leave-en
[link] translation_of pair created (vi ↔ en)
[audit] brain seq=15402 chain=…
3. Search
$ cyberos-kb search "hóa đơn cấp khi nào" --limit 5
rank slug score snippet
1 how-to-issue-hoadon 0.94 "...cấp hóa đơn khi giao hàng hoặc..."
2 policy-hoadon-issuance 0.89 "...Theo Circular 78/2021, hóa đơn phải cấp..."
3 runbook-hoadon-failure 0.76 "...nếu cấp hóa đơn thất bại, kiểm tra..."
4 decision-log-hoadon-migration 0.71 "...chúng tôi chọn vn-vat-invoice vì..."
5 reference-hoadon-fields 0.68 "...trường mst, đơn vị tính, thuế suất..."
4. Ask a page
$ cyberos-kb ask-page \
--slug how-to-issue-hoadon \
"When must we issue the hóa đơn?"
answer (confidence: 0.91):
The hóa đơn must be issued at the time of delivery of goods or completion
of service provision [1], with the exception of advance payment scenarios
where it must be issued within 5 working days of receipt [2].
citations:
[1] how-to-issue-hoadon · v3 · spans 142-298
[2] policy-hoadon-issuance · v7 · spans 1402-1490
5. Promote to canonical
$ cyberos-kb promote --slug policy-leave --canonical-path memories/policy/leave.md
[validate] requesting CDO approval (CHAT Notify sent)
[approved] by stephen@cyberskill.world at 2026-05-14T09:32Z
[brain] canonical entry created at memories/policy/leave.md
[audit] brain seq=15418 chain=…
6. Notion import
$ cyberos-kb import notion --zip notion-export.zip --map cat=how-to
[parse] 247 pages found in zip
[convert] blocks → markdown
[create] ✓ 244 docs created · ✗ 3 (in errors.csv)
[brain] all enqueued
[audit] brain seq=15489 chain=…
7. Export a tree
$ cyberos-kb export --root policy --format markdown --output ./policies/
[export] 28 docs · 4 categories · written to ./policies/
[manifest] policies/INDEX.md generated
Phase status & estimates
cyberos-kb| Capability | Status |
|---|---|
| Markdown editor + frontmatter | planned · P1 |
| Immutable versioning + diff | planned · P1 |
| Per-page ACL + share-link tokens | planned · P1 |
| FTS5 / PGroonga + semantic + reranker | planned · P1 |
| "Ask this page" with citations | planned · P1 |
| "Ask the KB" (whole-corpus QA) | planned · P1 |
| BRAIN ingest p95 ≤ 5 s | planned · P1 |
| Promote-to-canonical (CDO gate) | planned · P1 |
| Translation linkage (vi ↔ en) | planned · P1 |
| Backlink graph | planned · P1 |
| Notion import + markdown export | planned · P1 |
| Trust Center public-readable pages | planned · P1 |
| Attachment AV scan | planned · P1 |
| Confluence / GitBook import | planned · P2+ |
| Real-time collaborative editing (Yjs) | planned · P2+ |
| Translation auto-draft via AI | planned · P2+ |
References
- PRD §9.12 — KB product FRs.
- PRD §19.7 — KB SRS-tier FRs.
- PRD §11.2 — NFRs.
- SRS §4.12 — Formal (FR pending) through (FR pending).
- BAAI BGE-M3 — multilingual embedding model (used for semantic layer).
- BAAI BGE-rerank-v2-m3 — cross-encoder reranker.
- PGroonga — Postgres full-text search with Vietnamese bigram tokenisation.
- pulldown-cmark — CommonMark + GFM parser for Rust.
- ammonia — HTML sanitiser for Rust.
- CommonMark + GFM — markdown specifications.
- Vietnam Decree 13/2023/NĐ-CP — Personal data processing.
- Vietnam Law 91/2025/QH15 (PDPL).
- Notion export format — ZIP of markdown + assets.
- Architecture context: infrastructure.html#kb.