Technical Glossary

This glossary explains both standard industry definitions and how these technologies specifically apply to THE WHEEL's privacy-first architecture.

AI & Machine Learning Architecture Encryption & Security Infrastructure Privacy & Compliance

🤖 AI & Machine Learning

Artificial intelligence technologies that power intelligent search and understanding

Chunking

The process of breaking down large documents into smaller, semantically meaningful segments. Proper chunking preserves context while staying within AI model token limits.

In THE WHEEL: We intelligently chunk your documents based on content type. Medical records preserve treatment sessions. Legal documents maintain clause boundaries. Conversations are split by topic shifts—not arbitrary token counts.

Context Window

The maximum amount of information (measured in tokens) that a language model can process in a single request. Different models have different context windows—some handle a few thousand tokens, while newer models can process hundreds of thousands. The context window includes both your input and the model's response.

In THE WHEEL: We intelligently retrieve only the most relevant chunks of your encrypted documents rather than trying to fit everything into the context window. This targeted approach gives you faster, more accurate responses than dumping massive amounts of information into the model.

Embeddings

Numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings, enabling computers to understand that "car" and "automobile" mean the same thing.

In THE WHEEL: We create embeddings of your document chunks, then apply an entity-specific mathematical transformation before storage. This enables semantic search using vectors (multi-dimensional lists of numbers)—when you search for "knee pain," we find documents about "joint discomfort" even if they don't use those exact words.

Hierarchical Navigable Small World HNSW

A graph-based algorithm for approximate nearest neighbor search in high-dimensional spaces. HNSW enables fast vector search on millions of embeddings with sub-50ms response times.

In THE WHEEL: HNSW indexes power our semantic search. Even with 100,000+ encrypted document chunks, searches return results in under 50 milliseconds.

Large Language Model LLM

A type of AI model trained on vast amounts of data to understand and generate responses. LLMs accept input data (instructions and needed information) and configuration settings (parameters like temperature for randomness and max tokens for length) to produce outputs. Frontier models from providers like OpenAI, Anthropic, and Google have billions of parameters and excel at complex reasoning, while smaller models are faster and cheaper for simple tasks.

In THE WHEEL: We use multiple LLMs depending on your query complexity. Simple lookups use smaller, faster models with your encrypted data for instant results. Complex reasoning tasks route to frontier models with anonymized queries—getting you the best answer while protecting your privacy.

Named Entity Recognition NER

A natural language processing technique that identifies and classifies named entities in text (people, organizations, locations, dates, etc.). NER extracts structured information from unstructured text.

In THE WHEEL: NER anonymizes your queries before sending them to external AI. We automatically detect and remove names, dates, locations, and contact info—protecting your privacy even when using third-party models.

Retrieval-Augmented Generation RAG

An AI architecture that combines information retrieval with text generation. RAG systems first search for relevant documents, then use that context to generate accurate, grounded responses.

In THE WHEEL: Our patent-pending Private by Design RAG works with your encrypted data. We retrieve relevant encrypted chunks, decrypt them in secure memory, and generate responses—all without permanently storing plaintext.

Token

The basic unit of text that language models process. Tokens are typically words or word fragments—for example, "chatbot" might be one token, while "extraordinary" could be split into "extra" and "ordinary". Most LLMs have token limits that constrain how much information they can process at once.

In THE WHEEL: We chunk your documents intelligently to stay within token limits while preserving context. A typical document chunk is 500-1000 tokens, ensuring we can retrieve and process relevant information without hitting model constraints.

Vector Search

A search method that finds documents based on semantic similarity rather than exact keyword matches. Vector search uses embeddings to understand meaning and context.

In THE WHEEL: Vector search lets you ask questions in natural language. Search for "what did my doctor say about my knee?" and find the right conversation—even if it never mentions "doctor" or "knee" explicitly.

🏗️ Architecture

System design patterns and architectural concepts

Application Programming Interface API

A set of rules and protocols that allow different software applications to communicate. APIs define how requests should be formatted and what responses to expect.

In THE WHEEL: Our REST API lets you programmatically upload documents, search content, and export data. All API requests require authentication tokens and are rate-limited for security.

Circuit Breaker

A resilience pattern that prevents cascading failures by stopping requests to a failing service. Like an electrical circuit breaker, it "opens" when failures occur and "closes" when the service recovers.

In THE WHEEL: If an external service fails repeatedly, circuit breakers prevent request pileup and allow graceful degradation. Your queries still work—they just route to backup systems automatically.

Idempotency

A property where an operation produces the same result regardless of how many times it's executed. Idempotent APIs prevent duplicate actions when requests are retried.

In THE WHEEL: Document uploads use idempotency keys. If your upload fails and you retry, we detect the duplicate and return the original result—preventing double charges and duplicate content.

Metadata

Data that describes other data—such as file names, dates, tags, or document properties. Metadata helps organize and find information but doesn't contain the actual content.

In THE WHEEL: File names, upload dates, tags, and folder structure are stored as searchable metadata. Your document content is encrypted, but metadata helps you organize and filter—stored securely and isolated by entity.

Microservices

An architectural style that structures an application as a collection of loosely coupled, independently deployable services. Each service handles a specific business function.

In THE WHEEL: Encryption, search, and AI processing run as separate services. This isolation ensures that a failure in one service doesn't bring down the entire platform.

Multi-Tenancy

A software architecture where a single instance of an application serves multiple customers (tenants). Each tenant's data is logically isolated from other tenants, even though they share infrastructure.

In THE WHEEL: Personal and professional entities are isolated using RLS and encryption. You share infrastructure for efficiency, but your data is completely isolated from other users and organizations.

Rate Limiting

A technique to control the number of requests a client can make to an API within a time window. Rate limiting prevents abuse, ensures fair usage, and protects system resources.

In THE WHEEL: API rate limits protect against accidental infinite loops and malicious attacks. Free tier users get 100 queries/month; pro tier gets 10,000 queries/month.

Server-Sent Events SSE

A server push technology that allows servers to send real-time updates to clients over a single HTTP connection. SSE is simpler than WebSockets for one-way communication.

In THE WHEEL: Chat responses stream to you in real-time using SSE. You see the AI's response as it's generated, rather than waiting for the entire answer to complete.

Volatile Memory

Computer memory that requires power to maintain stored data. When power is lost or a process ends, volatile memory is automatically cleared. RAM (Random Access Memory) is the most common type.

In THE WHEEL: Decrypted content exists only in volatile memory during AI processing—typically for just seconds. When processing completes, that memory is wiped. Your plaintext never touches permanent storage.

Webhooks

HTTP callbacks triggered by specific events. Webhooks allow applications to notify other systems when something happens, enabling real-time integrations.

In THE WHEEL: (Coming soon) Webhooks will notify your systems when documents finish processing or when specific search queries match new content.

🔐 Encryption & Security

Cryptographic standards and security protocols that protect your data

Advanced Encryption Standard AES

A symmetric encryption algorithm adopted by the U.S. government as the standard for encrypting sensitive information. AES-256 uses 256-bit keys, making it virtually unbreakable with current technology.

In THE WHEEL: We use AES-256-GCM to encrypt your personal documents and conversations at rest. Your data is encrypted before it's stored, and only you have the keys to decrypt it.

Bring Your Own Key BYOK

A model where customers use their own API keys for external services rather than relying on a provider's shared access. BYOK gives organizations direct control, usage visibility, and cost management.

In THE WHEEL: (Coming soon) Bring your own API keys for external LLM providers like Anthropic or OpenAI. You get direct billing, full usage transparency, and can enforce your own rate limits and policies.

Ciphertext

Encrypted data that appears as scrambled, unreadable text. Ciphertext can only be converted back to plaintext (decrypted) using the correct encryption key.

In THE WHEEL: Your documents are stored as ciphertext in Google Cloud Storage. Even if someone accessed our storage buckets, they'd only find encrypted files that can't be read without your encryption keys.

Federal Information Processing Standard FIPS

U.S. government standards for cryptographic modules. FIPS 140-2 Level 3 requires physical tamper-evidence and identity-based authentication for accessing cryptographic keys.

In THE WHEEL: Our HSMs are FIPS 140-2 Level 3 certified, meeting the same security standards used by government agencies and financial institutions.

Galois/Counter Mode GCM

An authenticated encryption mode that provides both confidentiality and authenticity. GCM detects if encrypted data has been tampered with, preventing malicious modifications.

In THE WHEEL: AES-256-GCM ensures your encrypted documents haven't been tampered with. If anyone tries to modify your encrypted data, decryption will fail automatically.

HMAC-based Key Derivation Function HKDF

A cryptographic function that derives multiple keys from a single master key. HKDF creates unique keys for different purposes without storing them separately.

In THE WHEEL: We use HKDF to derive purpose-specific keys from your private encryption key. For example, keys for encrypting metadata like file names and tags are derived using HKDF, ensuring each encryption purpose has its own cryptographically independent key.

Hardware Security Module HSM

A physical computing device that safeguards and manages cryptographic keys. HSMs are tamper-resistant and provide a secure environment for key generation and encryption operations.

In THE WHEEL: Your private encryption key is protected by Google Cloud HSM (FIPS 140-2 Level 3 certified). Even our engineers cannot access this key—it exists only in hardware-protected memory.

Key Management Service KMS

A cloud service that manages cryptographic keys for applications. KMS handles key creation, rotation, and access control, removing the burden of manual key management.

In THE WHEEL: We use Google Cloud KMS to manage your entity-level encryption keys. Keys are automatically rotated, and all key operations are logged for security auditing.

Nonce

A "number used once"—a random value used in cryptographic operations to ensure the same data encrypted multiple times produces different outputs. Prevents pattern analysis attacks.

In THE WHEEL: Document content uses random nonces for each encryption, ensuring identical documents produce different encrypted outputs. This prevents attackers from detecting duplicate content.

Optimal Asymmetric Encryption Padding OAEP

A padding scheme for RSA encryption that adds randomness and prevents certain cryptographic attacks. OAEP makes RSA encryption more secure against chosen-ciphertext attacks.

In THE WHEEL: RSA-OAEP ensures your encryption keys are protected with industry-standard padding, meeting NIST cryptographic recommendations.

Plaintext

Unencrypted, human-readable data. Plaintext can be read by anyone who accesses it without requiring decryption keys.

In THE WHEEL: Your documents are encrypted before storage—we never store plaintext versions. Decryption to plaintext happens only in volatile memory during query processing, then memory is immediately wiped.

Rivest-Shamir-Adleman RSA

A public-key cryptosystem used for secure data transmission. RSA uses two keys: a public key for encryption and a private key for decryption.

In THE WHEEL: RSA-OAEP-4096 is used in the cryptographic handshake between your device and the HSM during secure operations. The HSM holds the private key, ensuring keys can only be accessed through the hardware security module.

Salt

Random data added to inputs before hashing or encryption. Salts ensure that identical inputs produce unique outputs, preventing rainbow table attacks and pattern recognition.

In THE WHEEL: Salts are used in key derivation to ensure each entity's encryption keys are cryptographically unique, preventing any two entities from having identical key material.

Session Key

A temporary encryption key used for a single session or transaction. Session keys are generated on-demand and discarded after use, limiting exposure if compromised.

In THE WHEEL: When encrypting multiple documents at once, we generate a single session key from the HSM to encrypt the batch—reducing HSM calls while maintaining security. The session key is destroyed immediately after the batch completes.

Transport Layer Security TLS

A cryptographic protocol that provides secure communication over a network. TLS 1.3 is the latest version, offering improved security and performance.

In THE WHEEL: All data in transit uses TLS 1.3 encryption. Your data is protected from the moment it leaves your device until it reaches our servers.

☁️ Infrastructure

Cloud platforms and services that host and scale THE WHEEL

Content Delivery Network CDN

A geographically distributed network of servers that cache and deliver content closer to users. CDNs reduce latency and improve load times by serving static assets from nearby locations.

In THE WHEEL: Our website and marketing pages are served through Firebase Hosting's global CDN. Your encrypted documents are never cached on edge servers—they're always fetched securely from origin.

Cloud Run

A fully managed serverless platform on Google Cloud that runs containerized applications. Cloud Run automatically scales from zero to thousands of instances based on traffic.

In THE WHEEL: Our backend runs on Cloud Run, scaling instantly when you need it and scaling to zero when you don't—keeping costs predictable. Each request runs in an isolated container for security.

Google Cloud Storage GCS

Object storage for unstructured data like documents, images, and backups. GCS provides high durability (99.999999999%) and encryption at rest by default.

In THE WHEEL: Your encrypted documents are stored in GCS with customer-managed encryption keys. Even if someone accessed our storage buckets, they'd only find encrypted files that can't be decrypted without your keys.

pgvector

A PostgreSQL extension that adds vector similarity search capabilities. pgvector stores embeddings and performs efficient nearest-neighbor searches using HNSW indexes.

In THE WHEEL: pgvector stores your encrypted document embeddings for semantic search. We can find relevant chunks across thousands of documents in under 50 milliseconds.

PostgreSQL

An open-source relational database known for reliability, data integrity, and advanced features like JSON support and full-text search. PostgreSQL is ACID-compliant and highly extensible.

In THE WHEEL: We use Cloud SQL PostgreSQL for metadata and access control. Document content is never stored in plaintext—only encrypted file references and search embeddings.

Redis

An in-memory data store used for caching, session management, and real-time analytics. Redis provides sub-millisecond response times by storing data entirely in RAM.

In THE WHEEL: Redis caches frequently accessed metadata and search results—speeding up repeat queries without storing sensitive plaintext. Cache entries expire automatically for security.

Row Level Security RLS

A database security feature that restricts which rows a user can access based on policies. RLS enforces data isolation at the database level, preventing accidental data leakage.

In THE WHEEL: RLS ensures you only see your entity's data—even if there's a bug in application code. Every database query is automatically filtered to your entity ID. Professional entity members can't access each other's personal data.

🔒 Privacy & Compliance

Standards and regulations that govern data privacy and security

Anonymization

The process of removing or masking personally identifiable information from datasets. Proper anonymization makes it impossible to identify individuals from the data.

In THE WHEEL: Before sending complex queries to external LLMs, we anonymize them by removing names, dates, locations, and identifying details. External AI sees only the semantic content—never your identity.

California Consumer Privacy Act CCPA

California state law that grants consumers rights over their personal data, including the right to know what data is collected, the right to delete data, and the right to opt out of data sales.

In THE WHEEL: We never sell your data. You can export or delete your data at any time. We collect only what's necessary to provide the service and clearly disclose all data practices.

General Data Protection Regulation GDPR

European Union regulation that governs data privacy and gives individuals control over their personal data. GDPR requires explicit consent, data portability, and the right to be forgotten.

In THE WHEEL: You can export all your data or permanently delete your account at any time. We never sell your data or use it for advertising. Your data is yours—we're just the secure storage.

Health Insurance Portability and Accountability Act HIPAA

U.S. regulation that protects sensitive patient health information from disclosure without consent. HIPAA applies to healthcare providers, not individuals managing their own health records.

In THE WHEEL: You can securely store your personal health records—HIPAA doesn't apply to individuals. For healthcare providers, we're working toward formal HIPAA compliance and BAAs for our enterprise tier.

Personally Identifiable Information PII

Any information that can identify a specific individual—including names, addresses, phone numbers, email addresses, Social Security numbers, and more. PII requires special protection under privacy laws.

In THE WHEEL: Your PII is encrypted at rest and anonymized before any external AI processing. Named entity recognition strips out names, dates, locations, and contact info from queries sent to third-party models.

Zero-Knowledge Architecture

A security model where the service provider has no knowledge of the data being stored. Even administrators cannot access user data without the user's encryption keys.

In THE WHEEL: Personal entities use zero-knowledge encryption. Your data is encrypted on your device before upload, and only you have the keys. We can't read your data—even if compelled by court order.