Can systems validate our intentions for us?
Background
A message comes into the slack channel.
The latest build doesn’t have any logs
Ouch.
After verifying the logs worked in older builds, we confirm something broke recently.
The config is versioned alongside our application. It is simple and clearly expresses intent.
logging:
remote: true
Git history shows it has not changed since the last working build.
After some triaging and experimenting, we act fast to get a fix out. It rolls through our pipeline and the logs are back.
Unexpectedly losing logs has happened a few times now. Our confidence in the deployment process is hurting. How can we be confident this won’t catch us off-guard again?
Building Confidence
To confidently roll out changes, my team invests in acceptance tests. The suite exercises the system in ways we expect users to. Regressions are automatically caught and rollouts automatically stopped.
We can include a logs test in the same acceptance test suite to be confident we won’t lose logs again.
Why Acceptance Test?
The configuration is simple, but it masks a lot of complexity.
Setting this one property to true
creates expectations on components we do not control.
As each component evolves, we could see changes in behavior.
- The app could move to a new account or region with different access rules
- Same for the Remote Logs Service
- The sidecar could be misbehaving on the latest release of the Base
- The app’s communication with the sidecar could have a breaking change
- The sidecar’s communication with the Service could have a breaking change
Why Not Acceptance Test?
One downside of acceptance testing platform-level features is every interested user must replicate it.
Will everyone want it? Definitely.
Someone enabled logs with this excellent declarative config. Can that config also be validated automatically by the folks who own the Remote Logs Service?
Building Confidence for All
To verify our intent works for everyone, we have some requirements.
- Trigger: when the application is deployed
- Expectation: a message we can always count on showing up
- Log access: a way to programatically fetch logs for our deployment
- Expected duration: how long to wait for logs
- Target: somewhere to report our results
If we have a reliable log statement to expect, we can run a validation task when applications are deployed. Expected durations allow timing out and reporting this state back instead of waiting indefinitely. The output of that validation can be sent back to the delivery platform. The platform can then block, notify, or act as it sees fit.
Declarative config at its best is intention-revealing. If we can take those intentions and automatically validate them, we can continuously build confidence.
Thanks to Roberto Pérez Alcolea for providing feedback on an early draft of this post.