Deploy a GenAI web app
A GenAI web app — a customer support assistant, an internal copilot, an “AI feature” inside a SaaS — is a different shape from a local coding agent. It serves many users at once, each session has different scope, and the security boundary needs to enforce per-tenant rules without becoming a bottleneck.
This guide shows how to put such an app behind OpenFirma. It builds on every other guide; cross-links are inline. The example app: a multi-tenant assistant where each user has their own session and the app calls OpenAI and a vendor SaaS on their behalf.
Architecture
Section titled “Architecture”┌──────────────────┐ ┌─────────────────────────┐│ HTTPS clients │──── HTTPS ────►│ Web app (Node/Py/Go) │└──────────────────┘ │ issues per-session │ │ capabilities, makes │ │ outbound LLM calls │ └────────┬────────────────┘ │ HTTP_PROXY=... ▼ ┌─────────────────────────┐ │ Sidecar │ ◄── per-pod or per-host │ (enforcement + inject) │ └────────┬────────────────┘ │ ├──► api.openai.com (allowed) ├──► api.acme-vendor.com (allowed) └──X paste.rs (denied)
┌─────────────────────────┐ │ Authority │ ◄── shared by all Sidecars │ (issuance + bundles) │ └─────────────────────────┘Three deployment shapes for the Sidecar:
- Per-pod sidecar (Kubernetes / containers). Standard sidecar pattern: each app pod has a Sidecar container, the app talks to it over loopback. Right for production — strong tenant isolation, easy to scale horizontally.
- Per-host daemon (VM / bare metal). One Sidecar per host, all app processes on the host route through it. Right for simpler topologies.
- Embedded (gRPC interceptor mode). Sidecar in-process with the app via the gRPC interceptor. Right when you control the app’s HTTP client and want zero proxy footprint.
The rest of this guide uses per-pod sidecar. The patterns translate to the other shapes with minor config changes.
Step 1: Define the agent identity model
Section titled “Step 1: Define the agent identity model”In a multi-tenant web app, “the agent” is not the app. The agent is the session — what the app is doing on behalf of one user. The choice that matters most:
| Choice | Meaning | Use when |
|---|---|---|
agent_id = <tenant_id> | One agent identity per tenant; sessions are sub-units. | Per-tenant policies, shared models. |
agent_id = <user_id> | One agent identity per end user. | Strict per-user audit isolation. |
agent_id = <app_name> | A single agent identity for the whole app. | Single-tenant; minimal isolation. |
This guide uses agent_id = <tenant_id> and session_id = <user-session-uuid>. That gives you per-tenant policies plus per-session isolation in the audit log.
Step 2: Set up shared infrastructure
Section titled “Step 2: Set up shared infrastructure”One Authority for the whole deployment. It signs every capability your app mints, streams policy bundles, broadcasts revocations.
# /etc/firma/firma.toml — the [authority] section[authority]listen_addr = "0.0.0.0:50051" # or behind an internal load balancerpolicy_dir = "/etc/firma/policies"issuance_policy_dir = "/etc/firma/issuance"revocation_file = "/var/lib/firma/revocations.txt"key_file = "/etc/firma/firma-authority.key"max_ttl_seconds = 3600 # capabilities live at most 1hbundle_ttl_seconds = 30 # push bundle updates every 30slog_level = "info"In production, run the Authority on a hardened host with limited access. Treat its signing key with the same care as a CA key.
The CA for HTTPS MITM (if you’re using it) lives separately on each Sidecar host — see Enable HTTPS MITM. Each host has its own CA; you don’t share one across hosts.
Step 3: Write the issuance policy
Section titled “Step 3: Write the issuance policy”This is the policy that decides whether the app can ever mint a capability. It runs once per session, so it can afford richer checks.
/etc/firma/issuance/issuance.cedar:
// Tenant-scoped agents may request these classes.permit ( principal, action in [ Firma::Action::"model.inference.chat", Firma::Action::"communication.external.send" ], resource) when { // tenant ids we recognize principal == Firma::Agent::"tenant-acme" || principal == Firma::Agent::"tenant-globex" || principal == Firma::Agent::"tenant-soylent"};
// No tenant gets payment classes from this app.forbid ( principal, action == Firma::Action::"payment.transfer", resource);The tenant list itself is declarative. When you onboard a new tenant, you add a line and push the bundle — no code change. When you offboard one, you remove the line and revoke their active capabilities.
Step 4: Write the runtime policy
Section titled “Step 4: Write the runtime policy”/etc/firma/policies/genai-app.cedar:
// LLM calls: permitted to OpenAI for known tenants.permit ( principal, action == Firma::Action::"model.inference.chat", resource) when { resource has "host" && resource.host == "api.openai.com"};
// Vendor SaaS: permitted to one specific endpoint, with rate limits// enforced via context.action_count.permit ( principal, action == Firma::Action::"communication.external.send", resource) when { resource has "host" && resource.host == "api.acme-vendor.com" && context.action_count <= 100};
// Hard floor: no exfiltration destinations, ever.forbid ( principal, action == Firma::Action::"communication.external.send", resource) when { resource has "host" && (resource.host == "paste.rs" || resource.host == "transfer.sh" || resource.host == "0x0.st")};context.action_count is the per-session call counter. The rule “max 100 vendor calls per session” caps a runaway loop.
Step 5: Configure the Sidecar
Section titled “Step 5: Configure the Sidecar”Each app pod runs a Sidecar with this config:
# /etc/firma/firma.toml — the [sidecar.*] sections[sidecar.interceptor]mode = "http_proxy"listen_addr = "127.0.0.1:8080"drain_timeout_secs = 30
[sidecar.interceptor.https_mitm]enabled = trueintercept_hosts = ["api.openai.com", "api.acme-vendor.com"]strict_hosts = ["api.acme-vendor.com"] # never fall back to CONNECT here
[sidecar.ca]dir = "/etc/firma/firma-ca"
[sidecar.mapping]rules_path = "/etc/firma/mapping-rules.toml"rules_paths = []default_protected = true # production!
[sidecar.policy]dir = "/etc/firma/cache/policies" # populated by Authority streamauthority_url = "https://firma-authority.internal:50051"
[sidecar.constraint_enforcement]bundle_ttl_seconds = 90enforcement_timeout_ms = 50
[sidecar.capability_seed]paths = [] # capabilities arrive via gRPC, not seed files
[sidecar.authority]public_key_path = "/etc/firma/firma-authority.pub"ca_cert_path = "/etc/firma/authority-ca.crt"
[sidecar.connector]default_timeout_ms = 30000
[[sidecar.connector.hosts]]host = "api.openai.com"rps = 100burst = 20timeout_ms = 30000
[[sidecar.connector.hosts]]host = "api.acme-vendor.com"rps = 50burst = 10timeout_ms = 15000
[[sidecar.credentials]]host = "api.openai.com"mode = "vault"header = "Authorization"prefix = "Bearer "secret_path = "secret/data/openai/api-key"secret_key = "value"
[[sidecar.credentials]]host = "api.acme-vendor.com"mode = "vault"header = "x-api-key"secret_path = "secret/data/acme-vendor/api-key"secret_key = "value"
[sidecar.credentials.vault]addr = "https://vault.internal:8200"# token via AppRole, configured via env
[sidecar.audit]sink = "grpc"grpc_url = "https://audit-collector.internal:9090"signing_key_path = "/etc/firma/audit.key"
[sidecar.log]level = "info"A few things worth highlighting:
default_protected = true— anything not in mapping rules denies. Production posture.authority_urluseshttps://+authority.ca_cert_path— sidecar verifies Authority identity before trusting streamed bundles/revocations.grpcaudit sink — events go to a centralized collector, not to a local file. Multiple Sidecars feed one collector.- Vault for credentials — no API keys on disk. The Sidecar pulls them on first use and caches in memory.
strict_hostson the vendor — if MITM fails (e.g. cert mismatch), the call denies rather than falling back to weaker CONNECT-only policy.
Step 6: Per-session capability issuance
Section titled “Step 6: Per-session capability issuance”The app’s request handler issues a fresh capability for each user session. Pseudocode (Python):
import grpcfrom firma_proto import authority_pb2, authority_pb2_grpc
def issue_capability_for_session(tenant_id: str, user_session_id: str): channel = grpc.secure_channel( "firma-authority.internal:50051", grpc.ssl_channel_credentials(...), ) stub = authority_pb2_grpc.AuthorityStub(channel) req = authority_pb2.IssuanceRequest( agent_id=f"tenant-{tenant_id}", session_id=user_session_id, requested_actions=[ "model.inference.chat", "communication.external.send", ], resource_scope="*", requested_ttl_seconds=900, # 15 minutes ) resp = stub.IssueCapability(req) if resp.HasField("denied"): raise PermissionError(resp.denied.reason) return resp.allowed.raw_tokenWhen a user starts a session, the app calls issue_capability_for_session(...), hands the resulting raw token to the Sidecar via the appropriate channel (a header on the proxied request, or a side channel — your design), and from then on the Sidecar can validate every call from that session against that capability.
For a 15-minute TTL with 1000 active sessions, the Authority issues 1000 capabilities every 15 minutes. The Sidecar holds them in its CapabilityMap. The hot path is unchanged.
Step 7: Wire the app to the proxy
Section titled “Step 7: Wire the app to the proxy”Set the app’s HTTP client to use the loopback Sidecar:
Python (httpx / requests):
HTTP_PROXY=http://127.0.0.1:8080 \HTTPS_PROXY=http://127.0.0.1:8080 \SSL_CERT_FILE=/etc/firma/firma-ca/firma-ca.crt \python -m gunicorn app:appNode:
HTTPS_PROXY=http://127.0.0.1:8080 \NODE_EXTRA_CA_CERTS=/etc/firma/firma-ca/firma-ca.crt \node app.jsThe app does not read OPENAI_API_KEY or vendor secrets. They live in Vault, the Sidecar pulls them, the app just makes calls without auth headers.
Step 8: Multi-tenancy in the audit log
Section titled “Step 8: Multi-tenancy in the audit log”Every request the app proxies produces an audit event tagged with agent_id = tenant-<id>, session_id = <user-session>. Ship those events to your collector keyed on agent_id and you have per-tenant audit by construction — no app-side instrumentation needed.
For per-user accounting on top of that, the session_id is the unit. If you record the mapping (session_id → user_id) somewhere, you can join the audit stream against it.
Step 9: Operational concerns
Section titled “Step 9: Operational concerns”A few practices that come up only at production scale.
Authority HA. The Authority is a single point of contact for capability issuance. Run two of them behind a load balancer; both point at the same policy_dir and key file. The Sidecar’s gRPC stream is independent per-Sidecar, and Sidecars reconnect automatically.
Bundle propagation latency. A new policy version takes bundle_ttl_seconds to propagate worst-case (Sidecars pull, Authority pushes). Plan for this when rolling out tightening rules — start with a stricter rule, deploy, wait for propagation, only then announce the change to tenants.
Revocation propagation. A firma authority revocations add <token_id> propagates within a second on the gRPC stream. For “kill this tenant immediately”, run revocation against every active capability for that tenant.
Capacity planning. Each Sidecar holds active capabilities + the policy bundle in memory. With 1000 active sessions and a 100 KB bundle, you’re well under 100 MB resident. The hot path stays bounded by the perf budgets (Stage 1 < 1ms, Stage 2 < 200µs) regardless of session count.
Failure modes. If the Authority is unreachable for longer than bundle_ttl_seconds, the Sidecar denies everything (PolicyBundleStale). This is the right shape — stale policy is not safe — but it means the Authority is effectively a critical dependency for your app’s availability. Monitor accordingly.
Tenant onboarding flow
Section titled “Tenant onboarding flow”Putting it all together, the new-tenant workflow is:
- Add
Firma::Agent::"tenant-newco"to the issuance policy. - Add any tenant-specific runtime rules (a
permitwithprincipal == Firma::Agent::"tenant-newco", etc.). - Push the policy bundle. Sidecars pick it up within
bundle_ttl_seconds. - Configure the app to use
agent_id = "tenant-newco"for that tenant’s sessions. - First session for the tenant: app calls
IssueCapability, gets a token, app makes calls, Sidecar validates.
Offboarding is the inverse: remove the entries from issuance + runtime policy, push, the Sidecar denies new capabilities and stale ones expire.
Common gotchas
Section titled “Common gotchas”PolicyBundleStale denials in production. Your Sidecars lost contact with the Authority. Check the network path, the Authority’s health, and consider raising bundle_ttl_seconds slightly to give yourself headroom for transient blips.
CapabilityScopeMismatch for legitimate calls. The capability’s resource_scope doesn’t match the request. Either tighten the scope at issuance time or loosen it. Match the scope to the agent’s mission, not to a wildcard — '*' is a smell in production.
Audit volume. A busy app produces a lot of events. Plan for the storage and the cost of shipping them. grpc sink + a horizontally scaled collector is the right shape.
Vault token rotation. AppRole renewal needs to happen before the token expires; the Sidecar does not auto-renew. Use a sidecar-of-the-sidecar (e.g. Vault Agent) to keep credentials fresh.
Sidecar restart drops in-memory capabilities. When a pod restarts, the Sidecar comes back with no CapabilityMap entries until sessions issue new ones. The app should retry on CapabilityNotFound by re-issuing.
What’s next
Section titled “What’s next”- Read & verify the audit log — operational practice for the multi-tenant log stream.
- Concepts: Threat model — what this protects against and what it doesn’t.
- Concepts: Connectors — for the per-host rate-limit and credential-injection details.