Rotate Everything in Production (Part 3): Agent mTLS Login and App Leaf Certs
This part covers the workload-facing rotations:
- agent login mTLS certs (Vault Agent via
auth/cert) - app leaf cert rotation (per-app
nginx-*roles + reload)
Apps in scope: two production workloads with their own agent identities and leaf-certificate targets.
TL;DR
- Rotate agent mTLS certs and cert-auth mappings first.
- Then issue app leaf certs and reload via post-hook.
- Verify
*.fullchain.pemplus a manual cert login.
3) Agent mTLS login certs
What the run did (per app)
Workload A
- environment-specific agent CN
- per-app agent mTLS role
- one
auth/certmapping for that agent identity - app-specific issuance policy plus a small marker/debug policy if you use one
Files written:
/home/<app-a>/vault/mtls/agent.key(0600)/home/<app-a>/vault/mtls/agent.crt(0644)/home/<app-a>/vault/ca/ca.pem(0644)
Validity (from run):
- short-lived and recent enough to replace the old agent certificate cleanly
Workload B
- environment-specific agent CN
- per-app agent mTLS role
- one
auth/certmapping for that agent identity - app-specific issuance policy plus a small marker/debug policy if you use one
Files written:
/home/<app-b>/vault/mtls/agent.key(0600)/home/<app-b>/vault/mtls/agent.crt(0644)/home/<app-b>/vault/ca/ca.pem(0644)
Validity:
- short-lived and recent enough to replace the old agent certificate cleanly
Why this step exists
Agent rotation is more than “renew a file”. In this setup it means:
- issue a client cert under a per-app role
- ensure
auth/certis enabled - upsert the mapping at
auth/cert/certs/<agent-name> - ensure policies are attached (including your marker policy)
If mappings or policies drift, Vault Agent can look “healthy” but fail to auth or renew.
Quick verification: manual cert login
For one workload (pattern):
Validate one workload login path with that service user's CA file and mTLS client certificate.
This single command exercises TLS trust, cert-auth mapping, and policy attachment.
5) App leaf cert rotation (nginx roles + reload)
Workload A
The run:
- upserted a per-app PKI role for the frontend leaf
- allowed only the intended internal and environment-specific domains
- removed
debug-policyfrom mapping - enabled KV access (
--with-kv-access) - detected KV v2 on the configured mount
- attached the app-specific KV policy
- strict auth:
Auth=cert (mTLS)only - detected the app KV path already exists, so seed not overwritten
- installed the post-hook that writes the rendered leaf
- restarted the corresponding user service
Leaf file check:
/home/<app-a>/tls/<app-a>.fullchain.pem- subject CN matches the app's internal/service domain
- issuer matches the expected intermediate CA
- expiry is sane
Workload B
The run:
- upserted a per-app PKI role for the frontend leaf
- allowed only the intended internal and environment-specific domains
- removed
debug-policyfrom mapping - enabled KV access (
--with-kv-access) - KV v2 already active on the configured mount
- attached the app-specific KV policy
- strict auth: cert only
- detected the app KV path missing (no seed because KV seeding was not enabled)
- installed the post-hook that writes the rendered leaf
- restarted the corresponding user service
Leaf file check:
/home/<app-b>/tls/<app-b>.fullchain.pem- subject CN matches the app's internal/service domain
- issuer matches the expected intermediate CA
- expiry is sane
Reload mechanism (post-hook)
The logs show Vault Agent executing a post-hook that triggers reloads by label via Podman:
RELOAD_TLS_LABEL='tls=true'PROXY_RELOAD='podman:<edge-proxy>'
This keeps the sequence clean: issue -> write -> reload.
Watch item: agent/server version mismatch
Both vault-agent-* services reported:
- Vault Agent:
1.21.1 - Vault server:
1.20.4
That mismatch is not automatically broken, but it is a real operational signal. After upgrades, re-test auth, renewal, and template rendering.
Tiny post-rotation checklist (per app)
- [ ]
*.fullchain.pemexists where expected - [ ] expiry is sane
- [ ] subject CN matches expected domain
- [ ] Vault Agent service is
active (running) - [ ] cert-auth login works with generated agent cert
Next up
Part 4 covers the edge and the failure you saw:
- proxy CA chain refresh for the edge proxy
- proxy + Vault container restarts
- why the final healthcheck returned “connection reset by peer” and what to do right after