Stateful Recon With Diffs and Cumulative Files

TL;DR

Recon gets much more useful once you keep state between runs.
The main output I care about is not a raw snapshot. It is the diff.
Plain text state files are enough if the format stays stable.

Stateless scans waste attention

If every run starts from zero and only prints the current live services, you end up doing the same reading over and over. That is fine for ad hoc work. It is weak for recurring automation.

The more useful question is:

What changed since the last time I looked?

That question pushed me toward explicit state files.

The three outputs that matter

For recurring probing, I care about three categories:

new services
changed services
removed services

That classification is simple, but it gives the whole pipeline shape. It tells me where to look first and what to ignore.

Why cumulative state helps

A cumulative file becomes the reference point for future runs. It lets the pipeline compare fresh results against remembered results and update the baseline after each pass.

This gives me two benefits:

I can review only deltas in the morning.
I can still fall back to the current cumulative state when I want the bigger picture.

The file format does not need to be fancy

I do not think this kind of pipeline needs an early database. A stable text format is often enough.

For example, a line-oriented state file can hold:

URL
status code
title
technology fingerprint

That is enough structure to detect meaningful service changes without creating a large maintenance burden.

The important part is update logic

The files are only useful if the update behavior is disciplined:

add new entries
replace changed entries
remove entries that are no longer live

That keeps the cumulative view clean instead of slowly filling up with dead data.

Why I also keep history snapshots

The cumulative file tells me the current truth as far as the pipeline knows. History snapshots tell me how that truth evolved.

That matters when I want to answer questions like:

Did this endpoint appear recently?
Was this title change new today or last week?
Did the pipeline miss a change because one run failed?

Even small history files make debugging much easier.

What this changed in practice

Once the pipeline had state, the review process became calmer. I was no longer staring at a giant service inventory every day. I was reading a list of actual changes.

That is the main point of stateful recon: not more data, but less repeated reading.

Takeaway

If you are automating recon, persistent state is one of the first features worth adding. The simplest possible version already gives a big return because it turns snapshots into change detection.

Public reference repo

The sanitized reference repository for this series is here: jenkins-recon. It contains the public-facing example cumulative state files, diff outputs, and update logic behind this write-up.