Vector Database Access Control Is Missing, and RAG Pipelines Pay for It

A vision paper posted to arXiv on 2026-06-18 names a gap as a database-systems problem rather than an application-layer one. Production vector databases enforce access control at the collection or index boundary, not at the embedding. Once a document is embedded into a shared vector store backing retrieval-augmented generation, approximate nearest-neighbor search can surface it to any user whose query is close enough, regardless of the row-level policy that governed the source row.

What is fine-grained access control, and why do vector databases break it?

Fine-grained access control (FGAC) is the database property that makes each query return only the rows a specific user’s policy permits. Relational systems have carried this primitive for decades: Postgres row-level security, Oracle Virtual Private Database, and SQL Server row-level security all intersect the result set with a policy predicate before a single row crosses the wire. The caller never sees a forbidden row because the engine never produces one.

The arXiv paper, Policy-aware Vector Search: A Vision for Fine Grained Access Control in Vector Databases (arXiv:2606.19803), states that FGAC is “not fully supported in modern vector databases,” even as those stores are used in “security-sensitive RAG and organizational AI pipelines.” The structural reason is that vector stores do not return rows. They return approximate nearest neighbors over a blended space of dense semantic vectors and, sometimes, structured attributes. The relational access-control model assumes the engine can cheaply intersect a policy against the candidate set. ANN search, by design, returns the closest vectors to a query, not the closest authorized vectors, and the indexing structures that make ANN fast were not built with per-user predicates in mind.

How does a shared embedding store defeat row-level Postgres permissions?

The leak happens at the retrieval step, not the generation step. In the reference RAG flow AWS describes, an embedding model converts source data into numerical vectors stored in a vector database, and the user’s query is itself vectorized and matched against that store. The matcher operates over the union of all embeddings; if a document was authorized for ingestion but not for a given end user, nothing in the retrieval step asks whether the end user may read it.

The migration path engineering teams actually follow flattens Postgres rows protected by row-level security into chunks, embeds them, and writes them into a single vector index keyed only by collection. The RLS predicate that protected the row in Postgres does not travel with the embedding. A user in tenant A who asks the assistant a question close in vector space to tenant B’s content can retrieve B’s chunks. The leak is silent because the retrieval layer has no notion it happened; the LLM then summarizes the chunk as if it were legitimately retrieved.

Patching this with an output filter on the LLM targets the wrong layer. By the time the generation model sees the chunks, the unauthorized content is already in the context window. A filter that screens the final answer for leakage would have to know, for every fact it emits, whether the caller was authorized to see the source that fact came from. That requires reconstructing the access-control decision the vector store skipped, inside a component not designed to make access-control decisions.

What do Pinecone, Azure AI Search, and Copilot Studio actually ship?

Vendor security for vector databases is scoped above the embedding: it controls who can call an index, not which embeddings a given query may return. This is the gap the paper names, and it is visible in shipping products.

Pinecone’s product page markets enterprise security as SSO, role-based access control, hierarchical encryption keys, private networking, namespaces, and roles assigned to users, service accounts, and API keys. These are controls over the caller: who can authenticate, which index they can reach, whether the connection is private. None of them constrain what a correctly authenticated query returns from within a namespace. A namespace can hold embeddings from many source documents and many users, and a query authorized against that namespace sees the whole namespace.

Microsoft’s Copilot Studio documentation illustrates the same scoping. Its RAG guidance enumerates the search providers (Bing, SharePoint, Graph, Dataverse, and Azure AI Search) and organizes knowledge sources into a table of authentication requirements and per-source constraints, noting that retrieval behavior “varies depending on factors like authentication, indexing, file formats, and storage constraints.” Authorization is framed at the source connection: which provider, authenticated how. The guidance puts the access decision at the source layer, not at the embedding, which is the gap the paper names.

The distinction matters for procurement. A security review that ticks “RBAC, SSO, encryption, private networking” has verified the perimeter. It has not verified that a user querying the index can only retrieve embeddings derived from documents they were authorized to read. Those are different properties, and vendor marketing tends to elide them.

What does the paper propose, and how much of it is measured?

The paper frames policy-aware, query-time filtering as the missing primitive: enforcement that intersects each ANN result set against the caller’s policy before embeddings return to the retrieval pipeline, rather than relying on output filtering or separate per-tenant indexes.

Two cautions on how to read it. First, it is a vision paper, not a measured exploit. Its own framing includes “preliminary findings” and “key open challenges”; it formalizes the policy model and compares enforcement strategies rather than benchmarking a working system against a deployed vector store. The leak path it describes is plausible and matches vendor documentation, but the paper does not report recall or latency numbers from a production trace. Second, its acceptance at SeQureDB 2026, co-located with SIGMOD 2026, puts the question on a database-systems venue agenda, which signals the field treats it as a real research problem, not that the problem is solved.

Why is per-embedding enforcement hard?

The paper frames an inherent three-way tension between correctly enforcing FGAC policies, achieving high ANN recall, and maintaining low query latency. Vector stores blend structured and unstructured attributes to return semantic, approximate results, which is the opposite of the relational model, where exact-match filtering and access-control predicates compose cheaply.

The tension is structural. ANN indexes trade exactness for speed by pruning the search space to a small set of candidate neighborhoods. A per-user policy is, in effect, an additional predicate the index was not built to apply. Applying it after retrieval means filtering top-k down to the authorized subset, which can leave too few results and force re-querying at a larger k, raising latency. Applying it during traversal means the index has to carry the policy in its structure, which conflicts with the assumption that one shared index serves many users. The paper’s open challenges live in this design space: how to push policies into the index without rebuilding it per user, and how to keep recall high when the authorized subset is sparse.

These are proposed problems with proposed directions, not numbers. Anyone quoting a specific recall or latency cost for policy-aware search is inventing it; the paper does not provide those figures.

What should practitioners do until the primitive exists?

Until vector stores enforce per-embedding policies natively, the burden falls on application code. Three mitigations are in use today, each with a different cost profile.

Tenant-scoped indexes are the bluntest instrument: one index per tenant or per policy equivalence class, so a query against tenant A’s index physically cannot reach tenant B’s embeddings. This closes the leak by construction but multiplies index count, complicates cross-tenant features, and degrades when the tenant count is large or the isolation boundary is finer than a tenant.

Pre-retrieval ACLs are what Microsoft’s SharePoint path does. The application resolves the user’s authorized document set before retrieval and constrains the vector search to that set, typically by filtering on a metadata field carrying the document id or ACL. This keeps the leak closed but pushes the access-control decision back into application code, and it requires that the ACL metadata on every embedding stay in sync with the source system’s policy.

Post-retrieval re-checks treat every retrieved chunk as untrusted until re-authorized against the source policy. This is the cheapest to bolt on and the easiest to get wrong: it assumes the application can, for every chunk, look up the policy decision the vector store skipped, within the latency budget the user expects. It is a defense-in-depth layer, not a primary control.

The vector store has become a second access-control boundary, and most deployments have not noticed. Postgres row-level security protected the source row. The embedding that row became is governed by whoever holds an API key to the index, and by whatever application code re-checks authorization after retrieval. The paper’s contribution is the naming of a gap that vendor marketing has been papering over with perimeter security. Naming it is a prerequisite to closing it, and closing it will cost recall, latency, or index count. Nobody has measured which yet.

Frequently Asked Questions

How do SharePoint and Azure AI Search differ as Copilot Studio RAG sources?

Within the same Copilot Studio RAG guidance, SharePoint retrieval applies security trimming so results include only content the user can read, while the Azure AI Search vector source is documented as having no security trimming and no authentication requirement for the user. Two adjacent knowledge sources in one product sit on opposite sides of the access-control gap the paper names.

What is the most common failure mode for pre-retrieval ACLs?

The embedding’s ACL metadata goes stale when a document is reclassified in the source system. A row moved from public to confidential in Postgres does not retract its embedding from the vector index until a re-embedding job runs, so the old authorization posture leaks until reindex. The control is only as current as the last sync.

Does using pgvector inside Postgres avoid the gap?

Largely, yes. Because pgvector stores embeddings as columns in ordinary Postgres tables, row-level security predicates intersect vector searches the same way they intersect any other query. The cost is that pgvector trades away the purpose-built ANN indexes and latency tuning of standalone vector stores, so the access-control correctness comes with a performance ceiling.

Is this leak the same as a prompt injection attack?

No. Prompt injection corrupts the instructions the LLM executes by way of retrieved or user-supplied text. The FGAC leak occurs upstream, at the ANN matcher, before the LLM receives any chunks. The two require separate controls: injection is a generation-layer problem, unauthorized retrieval is a database-layer one.