Definition of “SRE” What is Site Reliability Engineering?
Site Reliability Engineering, or short-SRE is a. Google-developed Service Management model. Development and operation of large distributed systems are closely linked. The control processes constitute a specification of the DevOps philosophy.
CompaniesIn Site Reliability Engineering, the Development Team also includes members who also have an Operations Background.
(© vladimircaribb – stock.adobe.com)
Site Reliability Engineers bridge the gap between development and operations by applying on issues of system administration software technical way of thinking. Share your work evenly between operating and development tasks. The ideal candidate for a SRE Post is, therefore, either a Software engineer with a strong Background as an Administrator or a highly skilled system administrator with experience in programming and automation.
SRE staff will investigate the operation of the resilience and vulnerability of the systems, in order to make optimization and scaling opportunities. In addition, the search for solutions, the handling of systems with the same priority to simplify. Experience in the operation of large IT infrastructures, flows directly into the development of the structures.
Does not belong to the concept that SREs spend more than half of their time for the operation. A violation of this rule will be regarded as a sign for a lack of implementation.
The SRE-concept was made successfully, and is known by the company Google, which it has long cultivated, before the company’s principles to the public. It has the same goals of process improvement as the DevOps philosophy. DevOps was coined around 2008, and team-wide collaboration is the culture. All of the instances associated with the same Vision in line, it’s a shared responsibility should be established to Succeed.
SRE and DevOps
Before the proliferation of DevOps development worked and operations teams independently, each with its own objectives and targets. In order to better communicate and smooth cooperation to work, advanced DevOps Teams to the most important in any company.
DevOps and SRE, are used both to reduce the gap between software development and software operation, with the aim of improving the Release cycle for complex distributed systems. The DevOps concept defines how the results should look like; it stands for a cultural change within a company. In the case of SRE is to make the theoretical DevOps-approach with the right methods and tools, as a workflow.
SRE includes the ongoing automation of manual tasks and the continuous Integration (Continuous Integration) and delivery (Continuous Delivery). SREs are responsible for operating reliability and automation throughout the infrastructure life cycle, monitor, provision and operation of the Releases.
The 5 basic principles of the DevOps philosophy and its implementation by SRE
1. Organizational Silos to reduce
Large companies have a complex organizational structure with a variety of Teams, who often work separately in the “Silos”. Each Team has a different perspective on the Whole thing, which promotes inefficiency. The task of DevOps and SREs is the team with each other better on the overall goals and a common Vision towards a vote.
2. Failures reputation as a process belonging
DevOps assumes that failures are part of the process, and helpful to learn from. SREs make sure that there are too many failures or outages. To do this, you use formulas to failures to be balanced with the release of new versions: Service Level indicators (SLIs) and Service Level objectives (SLOs). SLIs to measure failures over time. A SLO is an agreement within a Service-Level Agreements on a certain metric, such as operating time or response time, which is to be observed.
From the SRE-perspective-a clear Agreement between the Business and the IT level is required, the optimum goals for Service Level objectives and Service-Level set of indicators. Any violation will result to evaluate the goals and to optimize.
The SRE policies to encourage radical changes within certain limits. SREs have a risk budget, in order to test these limits and be potentially faster, innovative. SRE quantified this risk is acceptable as an “error budget”. If the error budgets are exhausted, shifting the focus of the development on the improvement of the reliability. Availability and further development to be balanced.
3. Changes in quick little steps to implement
How to DevOps, SRE promotes continuous improvement through small and frequent development steps. In short iteration, any negative impact cycles are less serious, and low-risk improvements can easily (as tested automated), and implemented.
4. Common tools and automation use
Incompatibility and problems of integration between technologies from different vendors, epochs, and use cases can create in a DevOps environment Silos. SRE a uniform technologies and cross-access to information in the various IT Teams. SRE requires that all Teams working on the same Service, use of the same technologies.
SRE followed the principle to automate manual tasks, the repetitive and reactive, and no lasting improvement. The automation is to make capacity for work, which brings long-term Benefits.
5. Reliability measurement of the data is based
The evaluation of suitable targets for Reliability (reliability) for DevOps and SREs a contextual challenge. SREs ensure that all levels in the company are in agreement, as the reliability is measured and what you should do is, if the value meets the requirements.
DevOps-key metrics are the number of Deployments of the time, the lead time of the commitment up to the release, the number of failed Deployments, and the required recovery time.
The basics for SRE are the Service-Level objective (SLO) and the Service Level indicator (SLI). SREs use these metrics to determine whether a Release is going to go with a Change in operation or not.