Secure Your LlamaIndex RAG Pipeline With AuthSec Delegation Tokens

The Problem

A LlamaIndex ingestion pipeline is a privileged workload. It reaches into your internal APIs, pulls restricted records, and feeds them directly into a vector index that your LLM will query. If the credential that reader uses leaks — through a log line, a trace, a misconfigured .env file — everything it could access is now accessible to whoever finds it.

Static API keys make the blast radius enormous:

They don't expire on their own
They carry no identity — you can't tell which pipeline used them
They're scoped to everything the key allows, not just what the reader needs
Revoking them takes down every other process sharing the same key

The Idea: Reader-Native Delegation

AuthSec issues short-lived, RS256-signed JWTs scoped to specific permissions. Instead of storing a credential, your AuthSecSecureReader requests a token at ingestion time, uses it for that call, and lets it expire. The raw credential for the downstream API never touches your pipeline at all.

terminal

AuthSecSecureReader.load_data()
        │
        ▼
AuthSec /delegation-token endpoint
  (verified by client ID)
        │
        ▼
Short-lived RS256 JWT
  (scoped, SPIFFE-identified, TTL ~30 min)
        │
        ▼
Protected API  ──►  JSON records
        │
        ▼
LlamaIndex Document objects
        │
        ▼
VectorStoreIndex  ──►  RAG queries

The AuthSecSecureReader is a standard LlamaIndex BaseReader subclass. Drop it into any ingestion pipeline and it just works — the token lifecycle is handled inside the SDK.

Install

terminal

pip install authsec-llamaindex

For full LlamaIndex support:

terminal

pip install "authsec-llamaindex[llamaindex]"

Ingest from a Protected Endpoint

terminal

from authsec_llamaindex import AuthSecSecureReader
from llama_index.core import VectorStoreIndex

reader = AuthSecSecureReader()

docs = reader.load_data(
    endpoint="secure-vault/metrics",
    scope="read:metrics",
)

index = VectorStoreIndex.from_documents(docs)
engine = index.as_query_engine()

response = engine.query("What is the Q2 revenue target?")
print(response)

That's the full integration. No token management, no credential storage, no manual Authorization header construction. The reader handles it.

What Happens Under the Hood

When load_data() is called, the SDK executes a five-step exchange that bakes security context directly into every Document:

1. AuthSecClient sends GET /authsec/uflow/sdk/delegation-token with the agent's client ID. 2. AuthSec verifies the identity and returns a signed JWT carrying the requested scope and a SPIFFE subject binding. 3. The SDK sends the downstream API request with Authorization: Bearer <token>. 4. The JSON response is parsed into LlamaIndex Document objects, each stamped with security metadata. 5. Documents flow into your VectorStoreIndex as normal.

The token log in a live run:

terminal

[AuthSec SDK] [Mode] LIVE — using official authsec-langchain-sdk
  |- Base URL  : https://prod.api.authsec.ai
  |- Client ID : fe6d5a81-58ac-4c4b-85fa-f84b6c9cb73d
[AuthSec SDK] [Delegation] Requesting delegation token via official SDK...
[AuthSec SDK] [Success] LIVE delegation token acquired via official SDK.
  |- Token  : eyJhbGciOiJSUzI1NiIsInR5...
  |- Cache  : SDK caches token internally (auto-refreshes on expiry).
[LlamaIndex Loader] Successfully ingested 2 document(s) from secure storage!

Each document the reader produces carries provenance metadata that your downstream retrieval logic can filter or gate on:

terminal

{
    "source_endpoint":    "secure-vault/records",
    "ingestion_auth":     "AuthSec AI Verified",
    "clearance_required": "TopSecret",
    "record_index":       0,
}

What You Get

Replacing static keys with delegation tokens through authsec-llamaindex gives you five properties that long-lived credentials simply cannot provide:

Ephemeral credentials — a delegation token issued for one ingestion run is expired before the next one starts. A token in a log file is already dead.
Least-privilege per reader call — read:metrics cannot read records. The scope travels with the token and is enforced at the API, not in the pipeline code.
Full audit trail — every token request carries the agent's client ID. The AuthSec server knows which pipeline ingested what, when, and under which scope.
No rotation burden — the client ID is long-lived; the tokens it generates are not. Revoke access through the AuthSec dashboard and the next ingestion run gets a 401 — cleanly, immediately.
Drop-in BaseReader compatibility — AuthSecSecureReader extends LlamaIndex's BaseReader directly. It works with SimpleDirectoryReader pipelines, IngestionPipeline, custom orchestrators — anything that accepts a standard reader.

Trying It Without a Full Setup

No AuthSec account needed to explore the integration. Omit the environment variables and the SDK falls back to MOCK mode automatically:

terminal

from authsec_llamaindex import AuthSecSecureReader

reader = AuthSecSecureReader()

docs = reader.load_data(
    endpoint="secure-vault/records",
    scope="read:records",
)

for doc in docs:
    print(doc.text)
    print(doc.metadata)

You get realistic mock records, a locally-generated JWT, and the complete document parsing and metadata stamping — enough to build and test your downstream RAG logic before touching a real protected API.

DOCUMENTATION

AuthSec SDK Reference

Read the docs →

TAGGEDLlamaIndexRAGAPI SecurityDelegation TokensJWTSPIFFE

Bishnu

Engineering