RAG Engine — RolegacyAI

Retrieval as the Core of Role Memory

A role memory store is only as useful as the ability to retrieve the right knowledge at the right moment. The RAG Engine is the retrieval and generation layer of RolegacyAI — the mechanism that connects a query (from a successor, from the Successor Brief Generator, from an agentic workflow) to the relevant accumulated memories of the role, and uses that retrieved context to generate a grounded, accurate response.

Unlike general-purpose AI assistants, the RAG Engine is scoped to the role. It does not draw on generic training data when answering questions about how a specific role operates. It draws on the actual documented decisions, lessons, and processes of that specific role, captured from the people who held it.

How Retrieval Works

When a query enters the RAG Engine, it goes through a multi-stage retrieval process:

Query embedding: The query is embedded into the same vector space as the role's memory entries using the same embedding model used during ingestion.
Semantic search: The query embedding is matched against the vector index of the role's memory store to find the semantically closest memory entries.
Metadata filtering: The initial candidate set is filtered by metadata — memory type, confidence score, recency, role holder, and domain — to surface entries most likely to be relevant and trustworthy.
Re-ranking: The filtered candidates are re-ranked by a combination of semantic relevance and operational importance (entries with high confidence, recent validation, or high-coverage domains rank higher).
Context assembly: The top-ranked entries are assembled into a structured context window, with source attribution preserved, and passed to the generation model.

Role-Scoped Retrieval

Retrieval is always role-scoped. The RAG Engine operates within the tenant and role boundaries defined by the access control layer — a query about a specific role only retrieves memories from that role's store. Cross-role retrieval (for example, to find analogous decisions made in related roles across the organisation) is a separately authorized operation that respects inter-role access policies.

Grounded Generation

Generation in RolegacyAI's RAG Engine is intentionally constrained. The generation model is instructed to answer based on the retrieved role memories, to cite the specific memory entries it draws on, and to flag when a question cannot be answered from the available memory — rather than generating a plausible-sounding but unsupported response. This grounding is essential: a successor who receives hallucinated operational guidance from an AI is worse off than one who receives no guidance at all.

Preserve role memory before key people move on.

Interested in applying the RAG Engine approach to your organisation? Register interest in RolegacyAI to explore whether this problem exists in your organisation.

Start a Conversation