Document-centric KV Cache

Memory woven
into every document.

Prefill once. Query forever. Minnesväv caches your documents' full AI context locally — so every query is cheaper, faster, and never leaves your infrastructure.

Join waitlist → How it works

~90%

Prefill cost reduction

1×

Prefill per document

Documents leave your infra

∞

Queries served from cache

The Problem

AI costs shouldn't be
a variable expense.

Every query against your internal documents re-processes the full context — a 200-page legal brief, a chip datasheet, a patient record. You pay full prefill cost every single time.

Today's approach

Query-centric inference

—Full document re-processed on every query
—Cost scales with users × queries × doc size
—Context sent to third-party API servers
—Unpredictable monthly bills
—No document-level ACL on cache access
—Per-user or per-query caching strategies

With Minnesväv

Document-centric caching

✓Document processed once — KV cache reused forever
✓Cost scales with corpus size, not query volume
✓KV cache stays entirely within your perimeter
✓Predictable pricing: per document, per prefill
✓Same ACL as your DMS, applied to the cache
✓Document-first: one cache entry per document

How it works

Prefill once.
Decode for every query.

LLM inference has two phases. We make prefill a one-time cost per document — not a per-query tax paid by every user, every time.

Document materializes

When a new document enters your DMS, Minnesväv runs prefill once — computing the full KV cache using your chosen open model, on your own hardware or private cloud.

prefill(doc_id, model) → kv_cache
// cost: O(doc_tokens) — one time

Cache lives locally

The KV cache is stored in your own infrastructure — on-prem DRAM, edge SSDs, or your private cloud. It inherits the exact ACL of the source document. Nothing leaves your domain.

store(kv_cache, acl=doc.permissions)
// location: your_infra_only

Queries decode cheaply

Every user query injects the cached context and runs only decode. No document re-processing — cost is proportional to the query alone, not the document that was already cached.

decode(query, kv_ref=doc_id)
// cost: O(query_tokens) per query

Architecture

Where the cache lives.

Minnesväv integrates with your DMS — EHR, SharePoint, Confluence, internal wikis. The cache store is fully within your perimeter. The model runs on cost-efficient open-weight inference.

📄

Source

DMS / EHR

🧵
Prefill Engine
Open Model

💾
KV Cache Store
Your Infra

💬

Decode Only

User Query

✦

Response

Fast & Cheap

Trigger

Document create or update event

Cache invalidation

Only when document or model changes

Access control

Inherited from source document ACL

Use Cases

Built for document-heavy industries.

Companies whose document corpus is stable and repeatedly queried see the highest ROI — where prefill-once translates directly to predictable, shrinking unit costs.

Legal

Law Firms & Compliance

Case files, contracts, and regulatory briefs queried by dozens of attorneys. Cache each document once, serve thousands of queries — with document-level ACL enforced on every cache access.

→ 70–90% reduction in per-query prefill cost

Healthcare

Hospitals & Clinical Systems

Patient records in EHR systems queried across departments. The KV cache stays within your HIPAA perimeter. PHI is never processed by external APIs after the initial on-prem prefill.

→ Zero data egress after first prefill

Semiconductors

Hardware & Datasheet-Driven R&D

Thousands of datasheets, design specs, and technical errata queried daily by engineers. Prefill the corpus once — engineers query fast without re-processing 500-page PDFs every time.

→ Prefill once per spec version, not per query

Fintech

Financial Services & Fintech

Risk models, regulatory filings, and fund prospectuses queried by analysts. Per-document billing maps cleanly to fintech unit economics — a line item you can actually predict and budget.

→ Predictable monthly AI spend

Pricing Model

Pay per document,
not per query.

Cost structure

Prefill (per document)billed once

Re-prefill on doc updatebilled on change

Re-prefill on model changebilled on change

Decode (per query)query tokens only

KV cache storageyour infra

Data egress to Minnesväv$0 — stays local

∑

Predictable by design

Costs scale with your document corpus, not query volume. Finance teams can model AI spend for the first time — it's just a document count.
↻

Pay only on change

Cache is rebuilt only when a document is updated or you switch model versions. A stable corpus pays once, period.
⇌

Open and proprietary models both supported

For open models, we cache the KV state. For proprietary APIs (Claude, GPT-4), we cache document-level query trees — same document-first guarantee, different mechanism.
⊛

Low-cost inference hardware

Works with cost-efficient inference engines — no dependency on expensive NVIDIA GPU clusters. Use commodity hardware or your existing on-prem investment.

Memory woven into every document.

AI costs shouldn't bea variable expense.