Document-centric KV Cache

Memory woven
into every document.

Prefill once. Query forever. Minnesväv caches your documents' full AI context locally — so every query is cheaper, faster, and never leaves your infrastructure.

~90%
Prefill cost reduction
Prefill per document
0
Documents leave your infra
Queries served from cache
The Problem

AI costs shouldn't be
a variable expense.

Every query against your internal documents re-processes the full context — a 200-page legal brief, a chip datasheet, a patient record. You pay full prefill cost every single time.

Today's approach

Query-centric inference

  • Full document re-processed on every query
  • Cost scales with users × queries × doc size
  • Context sent to third-party API servers
  • Unpredictable monthly bills
  • No document-level ACL on cache access
  • Per-user or per-query caching strategies
With Minnesväv

Document-centric caching

  • Document processed once — KV cache reused forever
  • Cost scales with corpus size, not query volume
  • KV cache stays entirely within your perimeter
  • Predictable pricing: per document, per prefill
  • Same ACL as your DMS, applied to the cache
  • Document-first: one cache entry per document
How it works

Prefill once.
Decode for every query.

LLM inference has two phases. We make prefill a one-time cost per document — not a per-query tax paid by every user, every time.

01

Document materializes

When a new document enters your DMS, Minnesväv runs prefill once — computing the full KV cache using your chosen open model, on your own hardware or private cloud.

prefill(doc_id, model) → kv_cache
// cost: O(doc_tokens) — one time
02

Cache lives locally

The KV cache is stored in your own infrastructure — on-prem DRAM, edge SSDs, or your private cloud. It inherits the exact ACL of the source document. Nothing leaves your domain.

store(kv_cache, acl=doc.permissions)
// location: your_infra_only
03

Queries decode cheaply

Every user query injects the cached context and runs only decode. No document re-processing — cost is proportional to the query alone, not the document that was already cached.

decode(query, kv_ref=doc_id)
// cost: O(query_tokens) per query
Architecture

Where the cache lives.

Minnesväv integrates with your DMS — EHR, SharePoint, Confluence, internal wikis. The cache store is fully within your perimeter. The model runs on cost-efficient open-weight inference.

📄
Source
DMS / EHR
🧵
Prefill Engine
Open Model
💾
KV Cache Store
Your Infra
💬
Decode Only
User Query
Response
Fast & Cheap
Trigger
Document create or update event
Cache invalidation
Only when document or model changes
Access control
Inherited from source document ACL
Use Cases

Built for document-heavy industries.

Companies whose document corpus is stable and repeatedly queried see the highest ROI — where prefill-once translates directly to predictable, shrinking unit costs.

Legal

Law Firms & Compliance

Case files, contracts, and regulatory briefs queried by dozens of attorneys. Cache each document once, serve thousands of queries — with document-level ACL enforced on every cache access.

→ 70–90% reduction in per-query prefill cost
Healthcare

Hospitals & Clinical Systems

Patient records in EHR systems queried across departments. The KV cache stays within your HIPAA perimeter. PHI is never processed by external APIs after the initial on-prem prefill.

→ Zero data egress after first prefill
Semiconductors

Hardware & Datasheet-Driven R&D

Thousands of datasheets, design specs, and technical errata queried daily by engineers. Prefill the corpus once — engineers query fast without re-processing 500-page PDFs every time.

→ Prefill once per spec version, not per query
Fintech

Financial Services & Fintech

Risk models, regulatory filings, and fund prospectuses queried by analysts. Per-document billing maps cleanly to fintech unit economics — a line item you can actually predict and budget.

→ Predictable monthly AI spend
Pricing Model

Pay per document,
not per query.

Cost structure
Prefill (per document)billed once
Re-prefill on doc updatebilled on change
Re-prefill on model changebilled on change
Decode (per query)query tokens only
KV cache storageyour infra
Data egress to Minnesväv$0 — stays local
  • Predictable by design
    Costs scale with your document corpus, not query volume. Finance teams can model AI spend for the first time — it's just a document count.
  • Pay only on change
    Cache is rebuilt only when a document is updated or you switch model versions. A stable corpus pays once, period.
  • Open and proprietary models both supported
    For open models, we cache the KV state. For proprietary APIs (Claude, GPT-4), we cache document-level query trees — same document-first guarantee, different mechanism.
  • Low-cost inference hardware
    Works with cost-efficient inference engines — no dependency on expensive NVIDIA GPU clusters. Use commodity hardware or your existing on-prem investment.
Early Access

Your memory should stay
where your documents live.

Minnesväv is in private beta with select design partners in legal, healthcare, and semiconductors. Join the waitlist to get early access and shape the roadmap.