This part covers the workload-facing rotations:

  • agent login mTLS certs (Vault Agent via auth/cert)
  • app leaf cert rotation (per-app nginx-* roles + reload)

Apps in scope: two production workloads with their own agent identities and leaf-certificate targets.


TL;DR

  • Rotate agent mTLS certs and cert-auth mappings first.
  • Then issue app leaf certs and reload via post-hook.
  • Verify *.fullchain.pem plus a manual cert login.

3) Agent mTLS login certs

What the run did (per app)

Workload A

  • environment-specific agent CN
  • per-app agent mTLS role
  • one auth/cert mapping for that agent identity
  • app-specific issuance policy plus a small marker/debug policy if you use one

Files written:

  • /home/<app-a>/vault/mtls/agent.key (0600)
  • /home/<app-a>/vault/mtls/agent.crt (0644)
  • /home/<app-a>/vault/ca/ca.pem (0644)

Validity (from run):

  • short-lived and recent enough to replace the old agent certificate cleanly

Workload B

  • environment-specific agent CN
  • per-app agent mTLS role
  • one auth/cert mapping for that agent identity
  • app-specific issuance policy plus a small marker/debug policy if you use one

Files written:

  • /home/<app-b>/vault/mtls/agent.key (0600)
  • /home/<app-b>/vault/mtls/agent.crt (0644)
  • /home/<app-b>/vault/ca/ca.pem (0644)

Validity:

  • short-lived and recent enough to replace the old agent certificate cleanly

Why this step exists

Agent rotation is more than “renew a file”. In this setup it means:

  • issue a client cert under a per-app role
  • ensure auth/cert is enabled
  • upsert the mapping at auth/cert/certs/<agent-name>
  • ensure policies are attached (including your marker policy)

If mappings or policies drift, Vault Agent can look “healthy” but fail to auth or renew.

Quick verification: manual cert login

For one workload (pattern):

Validate one workload login path with that service user's CA file and mTLS client certificate.

This single command exercises TLS trust, cert-auth mapping, and policy attachment.


5) App leaf cert rotation (nginx roles + reload)

Workload A

The run:

  • upserted a per-app PKI role for the frontend leaf
  • allowed only the intended internal and environment-specific domains
  • removed debug-policy from mapping
  • enabled KV access (--with-kv-access)
  • detected KV v2 on the configured mount
  • attached the app-specific KV policy
  • strict auth: Auth=cert (mTLS) only
  • detected the app KV path already exists, so seed not overwritten
  • installed the post-hook that writes the rendered leaf
  • restarted the corresponding user service

Leaf file check:

  • /home/<app-a>/tls/<app-a>.fullchain.pem
  • subject CN matches the app's internal/service domain
  • issuer matches the expected intermediate CA
  • expiry is sane

Workload B

The run:

  • upserted a per-app PKI role for the frontend leaf
  • allowed only the intended internal and environment-specific domains
  • removed debug-policy from mapping
  • enabled KV access (--with-kv-access)
  • KV v2 already active on the configured mount
  • attached the app-specific KV policy
  • strict auth: cert only
  • detected the app KV path missing (no seed because KV seeding was not enabled)
  • installed the post-hook that writes the rendered leaf
  • restarted the corresponding user service

Leaf file check:

  • /home/<app-b>/tls/<app-b>.fullchain.pem
  • subject CN matches the app's internal/service domain
  • issuer matches the expected intermediate CA
  • expiry is sane

Reload mechanism (post-hook)

The logs show Vault Agent executing a post-hook that triggers reloads by label via Podman:

  • RELOAD_TLS_LABEL='tls=true'
  • PROXY_RELOAD='podman:<edge-proxy>'

This keeps the sequence clean: issue -> write -> reload.


Watch item: agent/server version mismatch

Both vault-agent-* services reported:

  • Vault Agent: 1.21.1
  • Vault server: 1.20.4

That mismatch is not automatically broken, but it is a real operational signal. After upgrades, re-test auth, renewal, and template rendering.


Tiny post-rotation checklist (per app)

  • [ ] *.fullchain.pem exists where expected
  • [ ] expiry is sane
  • [ ] subject CN matches expected domain
  • [ ] Vault Agent service is active (running)
  • [ ] cert-auth login works with generated agent cert

Next up

Part 4 covers the edge and the failure you saw:

  • proxy CA chain refresh for the edge proxy
  • proxy + Vault container restarts
  • why the final healthcheck returned “connection reset by peer” and what to do right after