Rotate Everything in Production (Part 4): Proxy Chain Refresh, Restarts, and the 'Connection Reset' Healthcheck

This final part covers:

(6) Proxy CA chain refresh (proxyprod)
(6b) Proxy stack restart (systemd --user)
(2b) Vault server container restart
(7) Vault TLS healthcheck that ended with connection reset by peer

TL;DR

Refresh proxy trust, then restart proxy and Vault containers.
Treat the final healthcheck as a hard gate, not a nice-to-have.
If you see resets, pause, re-check, and inspect unit logs.

6) Proxy CA chain refresh (proxyprod)

What the run was aiming for

Keep proxy trust aligned with the CA chain Vault and your issued leaf certs depend on.

From the run:

ENV=prod
VAULT_ADDR=https://127.0.0.1:22400
TLS_SNI=vault.prod.privsec.ch
TLS_DIR=/home/vaultprod/tls-prod
root CA source: /home/vaultprod/tls-prod/root_ca.pem
cert mapping: auth/cert/certs/agent-proxyprod

What the run changed

root CA mirrored to /home/proxyprod/vault/ca/ca.pem
cert mapping updated with policies:
- marker-cert-auth
- pki-issue-proxyprod
- pki-ca-read-proxyprod
mTLS CN ok: agent-proxyprod.prod.privsec.ch
post-hook installed:
- /home/proxyprod/.vault-agent-proxyprod-ca/bin/vault-agent-post-chain.sh
user service restarted:
- vault-agent-proxyprod-ca.service (v4.9)
CA chain confirmed:
- /home/proxyprod/nginx/ca/current-ca-chain.pem

Proxy step verification (from the run)

sudo -u proxyprod journalctl --user -u vault-agent-proxyprod-ca.service -e -n 60

openssl s_client -connect 127.0.0.1:22300 -servername vault.prod.privsec.ch \
  -CAfile /home/proxyprod/vault/ca/ca.pem -brief </dev/null | sed -n '1,10p'

6b) Restart proxy stacks (systemd --user)

The run restarted:

container-proxyprod.service

It showed active (running) immediately after.

Why restart matters: if the proxy reads certs/chain from disk, it will keep the old material until you reload or restart.

2b) Restart Vault server container (systemd --user)

The run restarted:

container-vaultprod.service

It showed active (running) right away, but “running” is not the same as “ready to accept TLS connections.”

7) The TLS healthcheck failed (in this run)

Immediately after the restarts, the health check printed:

Error checking seal status: Get "https://127.0.0.1:22400/v1/sys/seal-status": read tcp 127.0.0.1:36828->127.0.0.1:22400: read: connection reset by peer

OpenSSL showed:

write:errno=104

The script still ended with:

[FULL ROTATE] mTLS FULL ROTATION COMPLETED.

So: rotation steps completed, but the final “are we alive?” check caught a real failure at that moment.

What “connection reset” usually means here

At this exact point in the timeline (right after a container restart), common causes are:

Vault is still starting and the listener is not ready
Vault restarted and dropped the connection
TLS listener is up but misread key/cert/chain
something on localhost:22400 accepts and immediately closes

Operational takeaway: treat the healthcheck as the gate that tells you whether to intervene now.

Immediate response checklist (fast, low drama)

1) Add a short pause

sleep 20

2) Re-run the simplest Vault check

vault status

If it works, you validated: listener + auth + trust in one hit.

3) If it still resets, check unit logs

sudo -u vaultprod journalctl --user -u container-vaultprod.service -e -n 120
sudo -u proxyprod journalctl --user -u container-proxyprod.service -e -n 120

4) Re-check TLS handshake with explicit SNI

openssl s_client -connect 127.0.0.1:22400 -servername vault.prod.privsec.ch -brief </dev/null | sed -n '1,20p'

The big lesson

Ordering is necessary but not sufficient. The rotation can be perfect and still fail because readiness after restart is a timing and stability problem.

The best hardening is not more steps. It is:

a short readiness delay
or a retry wrapper around the healthcheck

No new concepts, just a less racey end state.