What SREs have to do with project-based services?
A lot has been said and written about how site reliability engineers (SRE) are shaping (and reshaping) the IT operations of modern applications. There are a few areas where we still debate about the benefits of having SREs in the scene. Nevertheless, for most cases, site reliability engineering is the answer.
Recently I was asked what SREs have to do with project-based services. Imagine short-term projects where we do a technology refresh or implement a new infrastructure model through the work of architects in the majority. They typically don't have Ops per se as such projects have a few months of duration. Would the site reliability engineering still be relevant?
My immediate reaction was to defend our flag. If we think about "secure by design" and "build to manage" principles, an SRE would make a difference in such projects. Design applications (and IT infrastructure) to be secure from the start and build them to be easier to manage (and observable) are things SREs can drive.
Of course, I was not satisfied with my answer, so I decided to put my head down and think about this more pragmatically. Sometimes when you need to correlate different things, there's no better way than back to the basics. Then, I revisited the SRE tenets and checked which ones would apply to short-term projects execution.
Scale Ops with Load
Most likely that on a 3-months project, we won't be able to automate all runbooks (the way we operate an application on production). However, SREs can design such runbooks in a way that automation is not only possible but effortless. The adoption of writing runbooks as algorithms or pseudocodes might allow future automated code generation in any scripting language. Since SREs are developing such documentation, they will also feedback the architectural design to improve it from an operations scalability perspective. Not just that, better architecture also leads to fewer Ops toil. On this aspect, architects and SREs should be best friends forever.
Recommended by LinkedIn
Observability
As a common ground with DevOps, the observability requirements should be part of the software engineering as functional and non-functional requirements. SREs are experts in system reliability that can provide valuable insights into the application or infrastructure designs to make them more observable. It's not about mere infrastructure monitoring, but it's related to the golden signals where the end-user experience is incorporated and considered.
Ops Readiness Review (ORR)
That's the last tenet that I can correlate between solution architecting and site reliability engineering. It is by far the most straightforward one as it's all about readiness for the operations phase. SREs can devise continuous improvement in operational readiness reviews (ORR). Working closely with architects, they can check the system readiness before releasing an app or infrastructure component version to production. ORR scorecard definitions and metrics will be optimal when it's an output of such collaboration.
What other activities can you uncover for project-based services with SREs? Please put your suggestions in the comments section below.
As the time demands, stay safe, protect your loved ones, and help others!
Thanks, Rod.
Site Reliability Engineer
2yBlameless Post Mortems is another tenet that is useful in Project Delivery. Most projects will encounter some kind of issue during testing phases of the project delivery (that's why we test). Embedding this process into projects can really fast track remediation of issues encountered to keep project delivery on track.
Retired Distinguished Engineer
2y“Architects and SREs should be best friends forever” I love it!