Rotate Everything in Production (Part 3): Agent mTLS Login + App Leaf Certs (ncprd1, resume2me)
This part covers the workload-facing rotations:
- agent login mTLS certs (Vault Agent via
auth/cert) - app leaf cert rotation (per-app
nginx-*roles + reload)
Apps in scope: ncprd1, resume2me.
TL;DR
- Rotate agent mTLS certs and cert-auth mappings first.
- Then issue app leaf certs and reload via post-hook.
- Verify
*.fullchain.pemplus a manual cert login.
3) Agent mTLS login certs
What the run did (per app)
ncprd1
- CN:
agent-ncprd1.prod.privsec.ch - role:
pki-prod/roles/agent-mtls-ncprd1 - mapping:
auth/cert/certs/agent-ncprd1 - policies:
pki-issue-ncprd1,marker-cert-auth
Files written:
/home/ncprd1/vault/mtls/agent.key(0600)/home/ncprd1/vault/mtls/agent.crt(0644)/home/ncprd1/vault/ca/ca.pem(0644)
Validity (from run):
- notBefore:
Dec 22 08:12:20 2025 GMT - notAfter:
Jan 21 08:12:50 2026 GMT
resume2me
- CN:
agent-resume2me.prod.privsec.ch - role:
pki-prod/roles/agent-mtls-resume2me - mapping:
auth/cert/certs/agent-resume2me - policies:
pki-issue-resume2me,marker-cert-auth
Files written:
/home/resume2me/vault/mtls/agent.key(0600)/home/resume2me/vault/mtls/agent.crt(0644)/home/resume2me/vault/ca/ca.pem(0644)
Validity:
- notBefore:
Dec 22 08:12:21 2025 GMT - notAfter:
Jan 21 08:12:51 2026 GMT
Why this step exists
Agent rotation is more than “renew a file”. In this setup it means:
- issue a client cert under a per-app role
- ensure
auth/certis enabled - upsert the mapping at
auth/cert/certs/<agent-name> - ensure policies are attached (including your marker policy)
If mappings or policies drift, Vault Agent can look “healthy” but fail to auth or renew.
Quick verification: manual cert login
For ncprd1 (from the run):
VAULT_ADDR="https://127.0.0.1:22400" VAULT_CACERT="/home/ncprd1/vault/ca/ca.pem" \
vault login -method=cert -client-cert "/home/ncprd1/vault/mtls/agent.crt" -client-key "/home/ncprd1/vault/mtls/agent.key"
This single command exercises TLS trust, cert-auth mapping, and policy attachment.
5) App leaf cert rotation (nginx roles + reload)
ncprd1
The run:
- upserted PKI role:
nginx-ncprd1 - allowed domains:
int.privsec.ch,prod.privsec.ch - removed
debug-policyfrom mapping - enabled KV access (
--with-kv-access) - detected KV v2 on mount
kv - attached KV policy:
secret-agent-ncprd1-policy - strict auth:
Auth=cert (mTLS)only - detected
kv/ncprd1exists, so seed not overwritten - installed post-hook:
/home/ncprd1/.vault-agent-ncprd1/bin/vault-agent-post-leaf.sh
- restarted user service:
vault-agent-ncprd1.service(v5.0)
Leaf file check:
/home/ncprd1/tls/ncprd1.fullchain.pem- subject:
CN = ncprd1.int.privsec.ch - issuer:
PrivSec Intermediate CA (prod) - notAfter:
Dec 23 08:12:57 2025 GMT
resume2me
The run:
- upserted PKI role:
nginx-resume2me - allowed domains:
int.privsec.ch,prod.privsec.ch - removed
debug-policyfrom mapping - enabled KV access (
--with-kv-access) - KV v2 already active on
kv - attached KV policy:
secret-agent-resume2me-policy - strict auth: cert only
- detected
kv/resume2memissing (no seed because--with-kv-seedwas not set) - installed post-hook:
/home/resume2me/.vault-agent-resume2me/bin/vault-agent-post-leaf.sh
- restarted user service:
vault-agent-resume2me.service(v5.0)
Leaf file check:
/home/resume2me/tls/resume2me.fullchain.pem- subject:
CN = resume2me.int.privsec.ch - issuer:
PrivSec Intermediate CA (prod) - notAfter:
Dec 22 23:43:22 2025 GMT
Reload mechanism (post-hook)
The logs show Vault Agent executing a post-hook that triggers reloads by label via Podman:
RELOAD_TLS_LABEL='tls=true'PROXY_RELOAD='podman:proxyprod'
This keeps the sequence clean: issue -> write -> reload.
Watch item: agent/server version mismatch
Both vault-agent-* services reported:
- Vault Agent:
1.21.1 - Vault server:
1.20.4
That mismatch is not automatically broken, but it is a real operational signal. After upgrades, re-test auth, renewal, and template rendering.
Tiny post-rotation checklist (per app)
- [ ]
*.fullchain.pemexists where expected - [ ] expiry is sane
- [ ] subject CN matches expected domain
- [ ] Vault Agent service is
active (running) - [ ] cert-auth login works with generated agent cert
Next up
Part 4 covers the edge and the failure you saw:
- proxy CA chain refresh (
proxyprod) - proxy + Vault container restarts
- why the final healthcheck returned “connection reset by peer” and what to do right after