Rotate Everything in Production (Part 4): Proxy Chain Refresh, Restarts, and the 'Connection Reset' Healthcheck
This final part covers:
- (6) Proxy CA chain refresh (
proxyprod) - (6b) Proxy stack restart (systemd
--user) - (2b) Vault server container restart
- (7) Vault TLS healthcheck that ended with
connection reset by peer
TL;DR
- Refresh proxy trust, then restart proxy and Vault containers.
- Treat the final healthcheck as a hard gate, not a nice-to-have.
- If you see resets, pause, re-check, and inspect unit logs.
6) Proxy CA chain refresh (proxyprod)
What the run was aiming for
Keep proxy trust aligned with the CA chain Vault and your issued leaf certs depend on.
From the run:
ENV=prodVAULT_ADDR=https://127.0.0.1:22400TLS_SNI=vault.prod.privsec.chTLS_DIR=/home/vaultprod/tls-prod- root CA source:
/home/vaultprod/tls-prod/root_ca.pem - cert mapping:
auth/cert/certs/agent-proxyprod
What the run changed
- root CA mirrored to
/home/proxyprod/vault/ca/ca.pem - cert mapping updated with policies:
marker-cert-authpki-issue-proxyprodpki-ca-read-proxyprod
- mTLS CN ok:
agent-proxyprod.prod.privsec.ch - post-hook installed:
/home/proxyprod/.vault-agent-proxyprod-ca/bin/vault-agent-post-chain.sh
- user service restarted:
vault-agent-proxyprod-ca.service(v4.9)
- CA chain confirmed:
/home/proxyprod/nginx/ca/current-ca-chain.pem
Proxy step verification (from the run)
sudo -u proxyprod journalctl --user -u vault-agent-proxyprod-ca.service -e -n 60
openssl s_client -connect 127.0.0.1:22300 -servername vault.prod.privsec.ch \
-CAfile /home/proxyprod/vault/ca/ca.pem -brief </dev/null | sed -n '1,10p'
6b) Restart proxy stacks (systemd --user)
The run restarted:
container-proxyprod.service
It showed active (running) immediately after.
Why restart matters: if the proxy reads certs/chain from disk, it will keep the old material until you reload or restart.
2b) Restart Vault server container (systemd --user)
The run restarted:
container-vaultprod.service
It showed active (running) right away, but “running” is not the same as “ready to accept TLS connections.”
7) The TLS healthcheck failed (in this run)
Immediately after the restarts, the health check printed:
Error checking seal status: Get "https://127.0.0.1:22400/v1/sys/seal-status": read tcp 127.0.0.1:36828->127.0.0.1:22400: read: connection reset by peer
OpenSSL showed:
write:errno=104
The script still ended with:
[FULL ROTATE] mTLS FULL ROTATION COMPLETED.
So: rotation steps completed, but the final “are we alive?” check caught a real failure at that moment.
What “connection reset” usually means here
At this exact point in the timeline (right after a container restart), common causes are:
- Vault is still starting and the listener is not ready
- Vault restarted and dropped the connection
- TLS listener is up but misread key/cert/chain
- something on localhost:22400 accepts and immediately closes
Operational takeaway: treat the healthcheck as the gate that tells you whether to intervene now.
Immediate response checklist (fast, low drama)
1) Add a short pause
sleep 20
2) Re-run the simplest Vault check
vault status
If it works, you validated: listener + auth + trust in one hit.
3) If it still resets, check unit logs
sudo -u vaultprod journalctl --user -u container-vaultprod.service -e -n 120
sudo -u proxyprod journalctl --user -u container-proxyprod.service -e -n 120
4) Re-check TLS handshake with explicit SNI
openssl s_client -connect 127.0.0.1:22400 -servername vault.prod.privsec.ch -brief </dev/null | sed -n '1,20p'
The big lesson
Ordering is necessary but not sufficient. The rotation can be perfect and still fail because readiness after restart is a timing and stability problem.
The best hardening is not more steps. It is:
- a short readiness delay
- or a retry wrapper around the healthcheck
No new concepts, just a less racey end state.