This document describes the step-by-step processing lifecycle of files uploaded to Rekall-IQ.

Step 1: Document Upload

An authorised Workspace Admin selects and uploads approved documents (such as PDF or TXT files) through the workspace administration interface.

Step 2: Private Cloud Storage

The uploaded file is transferred securely via HTTPS and stored in a private Supabase Storage bucket named documents. Access to this bucket is protected and is not publicly readable.

Step 3: Background Ingestion Queue

Upon successful storage, the upload API dispatches an asynchronous background processing event to Inngest. This immediately returns a response to the user interface, showing the document status as processing, and prevents timeouts during large file ingestion.

Step 4: Text Extraction

Our backend worker downloads the file from Supabase Storage and parses its content to extract readable plaintext. If a document cannot be read or contains no extractable text, processing fails, and the status changes to failed with an error message.

Step 5: Content Chunking

The extracted text is split into smaller, overlapping sections called “chunks.” Chunking ensures that search queries can locate specific paragraphs rather than loading entire multi-page documents, keeping answers precise and preserving AI prompt efficiency.

Step 6: Vector Embedding Generation

Each text chunk is processed using the Google Gemini embedding model (gemini-embedding-001). This converts the human-readable text chunk into a high-dimensional mathematical vector that represents the semantic meaning of that content.

Step 7: Vector Indexing

The generated vectors, along with the text chunk content and a workspace_id metadata property, are stored in Qdrant Cloud. The document status is updated in the database to ready(displayed as “Completed” in the dashboard).

Step 8: Context Retrieval and Question Answering

When a team member enters a question in the chat interface:

The system generates a semantic vector of the user's question.
It performs a search in Qdrant, retrieving only chunks that match the user's workspace_id.
The retrieved text chunks are sent, alongside the query, to the Google Gemini model to formulate a response.

Step 9: Answer Citations

The system returns the generated answer to the user alongside source citations, listing the specific document filenames and context snippets used. This enables users to audit the output and verify the information in plain sight.

Step 10: Retention and Deletion

When a Workspace Admin deletes a document:

The file is permanently removed from the Supabase Storage bucket.
All associated database chunk records are deleted.
Related vectors matching the document ID are purged from Qdrant Cloud.
The document record is removed from the PostgreSQL database.

Data Privacy Boundary Summary

All document text extraction, chunk indexing, and question-answering lookups are partitioned using the organisation's workspace_id. Under no circumstances is context retrieved from or shared with other workspaces on the platform.

Data Handling & Processing Lifecycle