Friction works to slow the advancement of moving objects. It stops progress. In a combustion engine, moving parts create heat when not properly oiled. That heat, in turn, wears down the mechanical parts of your engine. In other words, friction steals momentum and wreaks havoc on your car.
In a release process, friction isn’t caused by a lack of oil, but rather the presence of toil. Toil consists of those manual, mundane, low-value tasks that steal the momentum of your software development. Some examples of toil would be a set of scripts that must be manually executed in the right environment, in precise order, with elevated permissions, during a pre-defined stage of the release process for the deployment to succeed. If you’ve ever been tasked to babysit a release process, you have been affected by toil.
Wherein as friction creates heat, toil produces burnout. As the mental health of your team dissolves so does the quality of your development. Engineers stop writing tests, spaghetti code consumes your application, and shortcuts become the regular instead of the norm. The sworn enemy of toil is automation and our task as DevOps engineers should be to aggressively seek out and eliminate toil wherever possible and replace with automated processes.
Once we’ve settled on eliminating toil as an operating principle, the next step is to put to bed what I’ll call the epic release. The epic release is when all that code written by distributed teams over the last sprint or quarter is deployed into your production environment like a boulder heaved into a glass house. This behemoth shakes the ground with each step. It’s so big, in fact, that it takes a myriad of manual processes just to stand up and verify its success, hence its relationship with toil. For years, deployed artifacts were very large containing layers upon layers of changes—frontend resources, backend services, middleware, shared libraries, et al. It is very difficult to automate an epic release. This isn’t the preferred approach anymore. Let’s look at an alternative.
The solution to the problem above is to abandon the epic release entirely for smaller, more frequent, incremental code change deployments. To do this you’ll need to shift your testing earlier in the development process to ensure your code is validated, benchmarked, and free from defects. You’ve probably heard the term Shift Left, but it essentially means finding defects earlier in the process. This includes more than just writing a whole slew of unit tests. Performance tests, integration tests, smoke tests, any or all of these could be part of validating new code introduced into your main branch. A fully automated testing strategy produces confidence that this change is ready for production now, not at the end of the quarter. Automation allows us to push out code more frequently any time of the day which should be transparent to most of your users.
If a feature has a go-live date, try leveraging something like Feature Toggles to enable that functionality. Utilizing new code as an activation lever is slower, harder to debug, and can be disruptive to your users—not to mention costly if that feature has to be rolled back.
Another solution to take advantage of versioning in your API. Services like Azure API Management can seamlessly direct requests to a specific version of your choice and be rerouted instantly.
If the only task your Azure DevOps pipeline are responsible for is compiling and deploying app code, you aren’t fully taking advantage of this powerful feature. Pipelines can do so much more including (and not limited to) executing PowerShell scripts, staging workloads, and deploying infrastructure changes alongside your app code.
Removing friction from your release process requires a commitment to putting an end to doing things the way they’ve always been done. We want to agree that toil is bad and the solution to toil is automation. In order to fully take advantage of automation though, we must break down the epic release for something better. Like a well-oiled engine, releases should be frequent, predictable, and nondisruptive.
For more information about release engineering, checkout the Operational Excellence pillar of the Well-Architected framework a https://aka.ms/azurewellarchitected.
See my interview with @davidblankedelman and ask me questions about this topic below!
Posted at https://bit.ly/3jPz9X6