This series documents the exact “rotate everything” workflow we ran in prod, and why the order matters more than the individual commands.

What this post focuses on:

  • dependency chain (what breaks what)
  • minimal checks that keep you safe
  • sequencing to avoid lockouts

Scope from the run: ENV=prod, VAULT_ADDR=https://127.0.0.1:22400, apps ncprd1,resume2me, proxy app proxyprod, SNI vault.prod.privsec.ch.


TL;DR

  • Rotate admin client cert first so vault status stays possible.
  • Vault server TLS is the center of gravity; everything else must be ready for it.
  • Agents before apps; proxy chain last; healthcheck gates the end.

One-line order of operations

  1. Rotate the admin client cert (mTLS).
  2. Rotate Vault server TLS (listener cert).
  3. Rotate agent login mTLS certs + cert-auth mappings.
  4. Rotate app leaf certs (nginx/app frontends).
  5. Refresh proxy CA chain + trust material.
  6. Restart proxy stack and Vault server container (if required).
  7. Run a Vault TLS healthcheck (seal status / s_client).

Why this order matters

1) Don’t lock yourself out

If you rotate server TLS before you have a working admin client cert and trust chain, you can lose the only reliable way to run vault status.

Start with the admin cert so you always have a known-good identity to validate the rest of the pipeline.

2) Vault server TLS is the center of gravity

Once the listener changes, every client path must still work:

  • Vault Agent mTLS login certs must match their cert-auth mappings
  • CA bundles used by clients/proxies must include the right chain
  • SNI must match what clients validate (vault.prod.privsec.ch)

If clients are not ready, you will see:

  • connection reset by peer
  • OpenSSL errno=104

3) Agents before apps

Apps depend on agents to authenticate and renew. If cert-auth is broken, leaf rotation becomes flaky or fails silently.

4) Proxy chain last (but before restarts)

Refresh proxy trust after internal rotations so you do not churn the public edge while core pieces are still changing.


Preflight checklist (short and honest)

  • Confirm ENV and Vault address:
    • ENV=prod
    • VAULT_ADDR=https://127.0.0.1:22400
  • Confirm SNI / TLS server name:
    • vault.prod.privsec.ch
  • Confirm which apps are included:
    • APPS=ncprd1,resume2me
    • PROXY_APPS=proxyprod
  • Know exactly which systemd user units you must restart:
    • vault-agent units per app
    • proxy container unit
    • Vault container unit

Minimal “I feel safe” checks during the run

After rotating admin client cert:

  • vault status works with the new cert/key

After rotating Vault server TLS + restart:

  • Listener accepts TLS with expected SNI
  • vault status works again

After agent/login + app leaf rotation:

  • *.fullchain.pem files exist with sane expiry

At the end:

  • sys/seal-status responds without resets
  • openssl s_client completes cleanly

Wrapper command from the run

sudo /usr/local/sbin/vault-run-home-safe prod -- \
  /opt/vault-sec/wrappers/mtls-full-rotate-prod-root.sh

Next up

Part 2 covers the first two rotations:

  • admin client cert (permissions, exports, trust chain)
  • Vault server TLS (owner, chain files, restart implications)