Rotate Everything in Production (Part 1): The Order That Keeps You Out of Trouble
This series documents the exact “rotate everything” workflow we ran in prod, and why the order matters more than the individual commands.
What this post focuses on:
- dependency chain (what breaks what)
- minimal checks that keep you safe
- sequencing to avoid lockouts
Scope from the run: ENV=prod, VAULT_ADDR=https://127.0.0.1:22400, apps ncprd1,resume2me, proxy app proxyprod, SNI vault.prod.privsec.ch.
TL;DR
- Rotate admin client cert first so
vault statusstays possible. - Vault server TLS is the center of gravity; everything else must be ready for it.
- Agents before apps; proxy chain last; healthcheck gates the end.
One-line order of operations
- Rotate the admin client cert (mTLS).
- Rotate Vault server TLS (listener cert).
- Rotate agent login mTLS certs + cert-auth mappings.
- Rotate app leaf certs (nginx/app frontends).
- Refresh proxy CA chain + trust material.
- Restart proxy stack and Vault server container (if required).
- Run a Vault TLS healthcheck (seal status /
s_client).
Why this order matters
1) Don’t lock yourself out
If you rotate server TLS before you have a working admin client cert and trust chain, you can lose the only reliable way to run vault status.
Start with the admin cert so you always have a known-good identity to validate the rest of the pipeline.
2) Vault server TLS is the center of gravity
Once the listener changes, every client path must still work:
- Vault Agent mTLS login certs must match their cert-auth mappings
- CA bundles used by clients/proxies must include the right chain
- SNI must match what clients validate (
vault.prod.privsec.ch)
If clients are not ready, you will see:
connection reset by peer- OpenSSL
errno=104
3) Agents before apps
Apps depend on agents to authenticate and renew. If cert-auth is broken, leaf rotation becomes flaky or fails silently.
4) Proxy chain last (but before restarts)
Refresh proxy trust after internal rotations so you do not churn the public edge while core pieces are still changing.
Preflight checklist (short and honest)
- Confirm ENV and Vault address:
ENV=prodVAULT_ADDR=https://127.0.0.1:22400
- Confirm SNI / TLS server name:
vault.prod.privsec.ch
- Confirm which apps are included:
APPS=ncprd1,resume2mePROXY_APPS=proxyprod
- Know exactly which systemd user units you must restart:
- vault-agent units per app
- proxy container unit
- Vault container unit
Minimal “I feel safe” checks during the run
After rotating admin client cert:
vault statusworks with the new cert/key
After rotating Vault server TLS + restart:
- Listener accepts TLS with expected SNI
vault statusworks again
After agent/login + app leaf rotation:
*.fullchain.pemfiles exist with sane expiry
At the end:
sys/seal-statusresponds without resetsopenssl s_clientcompletes cleanly
Wrapper command from the run
sudo /usr/local/sbin/vault-run-home-safe prod -- \
/opt/vault-sec/wrappers/mtls-full-rotate-prod-root.sh
Next up
Part 2 covers the first two rotations:
- admin client cert (permissions, exports, trust chain)
- Vault server TLS (owner, chain files, restart implications)