Rotate Everything in Production (Part 4): Proxy Chain Refresh, Restarts, and the 'Connection Reset' Healthcheck
This final part covers:
- (6) Proxy CA chain refresh
- (6b) Proxy stack restart (systemd
--user) - (2b) Vault server container restart
- (7) Vault TLS healthcheck that ended with
connection reset by peer
TL;DR
- Refresh proxy trust, then restart proxy and Vault containers.
- Treat the final healthcheck as a hard gate, not a nice-to-have.
- If you see resets, pause, re-check, and inspect unit logs.
6) Proxy CA chain refresh
What the run was aiming for
Keep proxy trust aligned with the CA chain Vault and your issued leaf certs depend on.
From the run:
- one production environment
- a local Vault HTTPS listener
- an environment-specific TLS server name
- a dedicated Vault TLS directory
- a root CA source copied from the Vault TLS directory
- one cert-auth mapping for the proxy agent
What the run changed
- root CA mirrored to the proxy user's Vault CA path
- cert mapping updated with policies:
marker-cert-auth- proxy issuance policy
- proxy CA-read policy
- mTLS CN matched the expected proxy agent identity
- post-hook installed for proxy chain refresh
- dedicated proxy CA user service restarted
- CA chain confirmed at the proxy trust path
Proxy step verification (from the run)
Check the proxy user's journal for the CA agent service, then run an OpenSSL check against the local Vault listener with the expected server name.
6b) Restart proxy stacks (systemd --user)
The run restarted:
- the proxy container unit
It showed active (running) immediately after.
Why restart matters: if the proxy reads certs/chain from disk, it will keep the old material until you reload or restart.
2b) Restart Vault server container (systemd --user)
The run restarted:
- the Vault container unit
It showed active (running) right away, but “running” is not the same as “ready to accept TLS connections.”
7) The TLS healthcheck failed (in this run)
Immediately after the restarts, the health check printed:
Error checking seal status: Get "https://127.0.0.1:<vault-port>/v1/sys/seal-status": read ...: connection reset by peer
OpenSSL showed:
write:errno=104
The script still ended with:
[FULL ROTATE] mTLS FULL ROTATION COMPLETED.
So: rotation steps completed, but the final “are we alive?” check caught a real failure at that moment.
What “connection reset” usually means here
At this exact point in the timeline (right after a container restart), common causes are:
- Vault is still starting and the listener is not ready
- Vault restarted and dropped the connection
- TLS listener is up but misread key/cert/chain
- something on localhost:22400 accepts and immediately closes
Operational takeaway: treat the healthcheck as the gate that tells you whether to intervene now.
Immediate response checklist (fast, low drama)
1) Add a short pause
sleep 20
2) Re-run the simplest Vault check
vault status
If it works, you validated: listener + auth + trust in one hit.
3) If it still resets, check unit logs
Review the user journals for the Vault and proxy services involved in the restart.
4) Re-check TLS handshake with explicit SNI
Repeat the local OpenSSL check to confirm the listener is back with the expected certificate.
The big lesson
Ordering is necessary but not sufficient. The rotation can be perfect and still fail because readiness after restart is a timing and stability problem.
The best hardening is not more steps. It is:
- a short readiness delay
- or a retry wrapper around the healthcheck
No new concepts, just a less racey end state.