How Doxi Protects Sensitive Documents: Zero-Knowledge Storage and Privacy-First AI
How Doxi achieves zero-knowledge storage: AES-256-GCM encryption in your browser, keys in IndexedDB, Setient holds only ciphertext. A technical deep dive into what the architecture guarantees — and what it doesn't.
How Doxi Protects Sensitive Documents: Zero-Knowledge Storage and Privacy-First AI
Handling sensitive documents with AI creates a fundamental tension: AI analysis is genuinely useful, but it requires the AI to see the document. For legal filings, medical records, immigration papers, and client-privileged communications, that visibility creates real risk.
Doxi resolves this tension with a two-part architecture. Document storage is zero-knowledge: documents are encrypted in your browser before they reach Setient, using keys that never leave your device. Setient holds only ciphertext it cannot decrypt — not by policy, but by design. Document analysis is user-controlled: you decide when AI touches a document, what gets analysed, and nothing is retained after the session.
This article explains both parts technically, what guarantees each provides, and where the honest boundaries lie.
Part 1: Zero-Knowledge Storage
What Zero-Knowledge Storage Actually Means
"Zero-knowledge" is a specific architectural claim: the service provider holds data but structurally cannot read it. Doxi's storage provides this guarantee for your documents.
When you upload a document to Doxi, this is the precise sequence:
Step 1: Key generation in your browser Doxi uses the Web Crypto API — a standardised cryptographic interface built into every modern browser — to generate a unique symmetric key for your document. This key is generated locally, inside your browser's sandbox, using the browser's cryptographically secure random number generator. It is never transmitted over the network.
Step 2: Client-side encryption before upload The document is encrypted using AES-256-GCM (Advanced Encryption Standard with Galois/Counter Mode) applied to the raw file bytes before any network transmission begins. AES-256 is the same standard used by governments and financial institutions for classified and regulated data. GCM mode provides authenticated encryption — any tampering with the ciphertext is detectable before decryption is attempted.
This encryption happens in your browser tab. The first byte that leaves your device is already ciphertext.
Step 3: Key storage in IndexedDB The encryption key is stored in your browser's IndexedDB — a local, persistent storage mechanism sandboxed to the Doxi origin. IndexedDB operates within the browser's same-origin policy: the key is readable only by Doxi's web application running in your browser, not by Setient's servers, not by third-party scripts, and not over any network connection.
Step 4: Ciphertext upload What leaves your browser and arrives at Setient's servers is encrypted ciphertext. We store it. We cannot decrypt it. We hold no copy of your key. If Setient's infrastructure were compromised, an attacker would receive ciphertext that provides no access to your document content.
The Cryptographic Implementation
Document encryption: AES-256-GCM (Web Crypto API — SubtleCrypto.encrypt)
Key generation: CSPRNG via Web Crypto API (window.crypto.getRandomValues)
Key exchange: X25519 (ECDH) for secure key sharing across your devices
Key storage: Browser IndexedDB (origin-sandboxed, not transmitted)
Integrity: GCM authentication tag (tamper-evident ciphertext)What Zero-Knowledge Storage Guarantees
- Setient cannot read stored documents: We hold ciphertext with no access to decryption keys
- A data breach does not expose document content: An attacker who obtains Setient's storage sees ciphertext that cannot be decrypted without your keys
- Setient cannot provide document content to third parties: What we cannot read, we cannot share — with anyone, including law enforcement or regulators
What Zero-Knowledge Storage Does Not Guarantee
- Lost keys mean permanently inaccessible documents: Zero-knowledge is absolute. There is no recovery mechanism, because any recovery mechanism would require Setient to hold a copy of your key — which would make the architecture not zero-knowledge. We strongly recommend exporting and backing up your keys using Doxi's key export feature.
- Storage zero-knowledge is distinct from processing zero-knowledge: When you choose to run AI analysis, the document leaves the zero-knowledge storage environment. This is covered in Part 2.
Part 2: Privacy-First AI Analysis
How AI Document Analysis Works
AI analysis of stored documents is always user-initiated. When you choose to analyse a document, the following sequence occurs:
Step 1: Local decryption in your browser The document is retrieved from Setient's storage as ciphertext and decrypted in your browser using the key from IndexedDB. At this point, the plaintext document exists only in your browser tab — it has not left the zero-knowledge storage environment yet.
Step 2: You initiate the analysis You select what type of analysis to run and explicitly confirm the action. Doxi does not automatically analyse documents, does not run analysis in the background, and does not process documents on upload. You control when AI touches your documents.
Step 3: The document is sent to the AI model The AI model — either a Setient-hosted model or an integrated third-party provider, depending on your plan and configuration — receives the document in plaintext for the duration of the analysis session. This is a fundamental requirement of how AI language models work: the model must read the text to analyse it.
We do not claim otherwise. During AI analysis, the AI sees your document.
Step 4: Results returned, content not retained Analysis results are returned to your browser. No document content is stored after the session. No content is used for AI model training. The session is complete.
What "User-Controlled" Means in Practice
- You decide when documents are analysed: There is no automatic or background processing
- You decide what gets analysed: You select specific documents or sections; nothing is processed without your explicit action
- No persistent AI access: Once a session ends, the AI has no ongoing access to your document content
Being Honest About the Trade-Off
During AI analysis, the model sees your document content. This is inherent to AI analysis — language models process text they can read.
What Doxi provides is:
- Control over when this happens — it is always your explicit decision
- Transparency about what it means — we document exactly what the AI receives and for how long
- Strong data handling commitments — no training on your data, no retention after session completion, documented model provenance
- An architecture where storage protection is cryptographic, not policy-based
For environments where even user-triggered AI access to plaintext is not acceptable — certain classified categories, legally privileged communications — Doxi's storage architecture provides value as an encrypted document repository independent of AI analysis features.
How This Architecture Addresses Regulatory Requirements
UK and EU GDPR
Doxi's zero-knowledge storage architecture directly addresses Article 5(1)(f) of UK and EU GDPR, which requires "appropriate technical or organisational measures" to ensure security of personal data, including protection against unauthorised access.
Client-side AES-256-GCM encryption with keys never transmitted to Setient is a strong technical measure that satisfies this requirement for stored documents. Organisations should document this as a technical safeguard in their Records of Processing Activities (ROPA) and Data Protection Impact Assessments (DPIA).
HIPAA
For US healthcare organisations, zero-knowledge storage addresses several HIPAA Security Rule requirements under 45 CFR §164.312: encryption and decryption of Protected Health Information (164.312(a)(2)(iv)), and access control for PHI at rest (164.312(a)(1)). User-controlled AI analysis satisfies the minimum-necessary standard — PHI is only shared with the AI when clinically justified and explicitly authorised.
Legal Privilege
For law firms handling privileged communications, Doxi's architecture provides a defence against inadvertent waiver. Documents in zero-knowledge storage were never accessible to Setient, removing the "third-party disclosure" argument that has undermined privilege claims in cloud-hosted document cases.
Use Cases: Where This Architecture Fits
Immigration Law Firms
Immigration case files contain detailed biographical data, status information, and supporting evidence that is highly sensitive. Doxi's zero-knowledge storage ensures client files are not exposed if cloud infrastructure is compromised. When solicitors choose to use AI for document review, that remains their explicit, auditable decision.
Healthcare Providers
Patient records require both HIPAA and GDPR compliance (for international operations). Zero-knowledge storage ensures that patient data at rest is encrypted with keys held only by the healthcare organisation. AI analysis of records occurs only when clinically indicated and explicitly initiated by authorised staff.
HR Departments
Employee records, disciplinary files, and compensation data are sensitive personal data subject to GDPR. Zero-knowledge storage ensures HR documents are accessible only to authorised personnel — not to Setient employees, cloud provider staff, or any party gaining access to Setient's infrastructure.
NGOs Working with Vulnerable Populations
NGOs handling case files for refugees, domestic violence survivors, and vulnerable adults operate in environments where data exposure carries direct human risk. Zero-knowledge storage provides protection that policy commitments alone cannot: Setient cannot provide data we cannot read.
Frequently Asked Questions
Is Doxi's storage architecture GDPR-compliant? Doxi's zero-knowledge storage is designed to satisfy Article 5(1)(f) of UK and EU GDPR. Whether your specific deployment is fully GDPR-compliant depends on your complete data processing context, including how you use AI analysis features. We recommend conducting a DPIA for your specific use case. Doxi's architecture provides a strong technical foundation — but compliance is the organisation's responsibility to document.
What happens if I lose my IndexedDB keys? If you lose access to the browser holding your IndexedDB keys without a backup, encrypted documents are permanently inaccessible. Doxi provides a key export mechanism — we strongly recommend using it to back up encryption keys to a secure location. Enterprise plans include key management infrastructure to handle device loss scenarios across teams.
Can I use Doxi in a fully on-premise configuration? Enterprise plans support self-hosted deployment where both document storage and AI analysis run on your own infrastructure. In this configuration, documents never leave your environment at any stage. Contact our team for enterprise deployment options.
How is the AI model for analysis selected? Standard plans use Setient-hosted AI models. Enterprise plans offer integration with your preferred AI provider, including on-premise model hosting for environments where cloud AI is not acceptable even for user-triggered analysis sessions.
What is the performance overhead of client-side encryption? AES-256-GCM encryption of a typical document (up to 10MB) adds approximately 50–200ms to upload and download operations. For most document workflows this is imperceptible. For high-volume batch processing, Doxi provides performance optimisation guidance.
Does Doxi support team access to the same encrypted documents? Yes. Doxi uses X25519 key exchange to enable secure key sharing across authorised devices and team members. Access control is managed through your Doxi account — only users you authorise can receive the key needed to decrypt a document.
Explore Doxi's zero-knowledge storage architecture or book a demo at /products/doxi. For enterprise deployments, contact our team to discuss architecture, compliance requirements, and self-hosted options.
Want to learn more?
Get in touch to discuss how we can help your organisation.