Declarative Confidence

October 25, 2023

Can systems validate our intentions for us?

Background

A message comes into the slack channel.

The latest build doesn’t have any logs

Ouch.

After verifying the logs worked in older builds, we confirm something broke recently.

The config is versioned alongside our application. It is simple and clearly expresses intent.

logging:
  remote: true

Git history shows it has not changed since the last working build.

After some triaging and experimenting, we act fast to get a fix out. It rolls through our pipeline and the logs are back.

Unexpectedly losing logs has happened a few times now. Our confidence in the deployment process is hurting. How can we be confident this won’t catch us off-guard again?

Building Confidence

To confidently roll out changes, my team invests in acceptance tests. The suite exercises the system in ways we expect users to. Regressions are automatically caught and rollouts automatically stopped.

We can include a logs test in the same acceptance test suite to be confident we won’t lose logs again.

Why Acceptance Test?

The configuration is simple, but it masks a lot of complexity.

Setting this one property to true creates expectations on components we do not control. As each component evolves, we could see changes in behavior.

The app could move to a new account or region with different access rules
Same for the Remote Logs Service
The sidecar could be misbehaving on the latest release of the Base
The app’s communication with the sidecar could have a breaking change
The sidecar’s communication with the Service could have a breaking change

Why Not Acceptance Test?

One downside of acceptance testing platform-level features is every interested user must replicate it.

Will everyone want it? Definitely.

Someone enabled logs with this excellent declarative config. Can that config also be validated automatically by the folks who own the Remote Logs Service?

Building Confidence for All

To verify our intent works for everyone, we have some requirements.

Trigger: when the application is deployed
Expectation: a message we can always count on showing up
Log access: a way to programatically fetch logs for our deployment
Expected duration: how long to wait for logs
Target: somewhere to report our results

If we have a reliable log statement to expect, we can run a validation task when applications are deployed. Expected durations allow timing out and reporting this state back instead of waiting indefinitely. The output of that validation can be sent back to the delivery platform. The platform can then block, notify, or act as it sees fit.

Declarative config at its best is intention-revealing. If we can take those intentions and automatically validate them, we can continuously build confidence.

Thanks to Roberto Pérez Alcolea for providing feedback on an early draft of this post.