Downtime: Kill your maintenance windows!

Downtime: Kill your maintenance windows!

Even if you don’t see maintenance windows as downtime – your customers do. […]

A few years ago I bought a “smart” thermostat for my home. I wanted to be able to control and check the temperature on the go. The part was quickly set up and connected to the manufacturer’s cloud backend. Are you okay? I don’t know. A few weeks later I received an email from the manufacturer regarding an upcoming “service Upgrade”. This should take several months, during which the company wanted to shut down its application for several hours at a time at different times of the day (which, of course, were not named). Of course, we apologized in advance for the inconvenience. So my “smart” thermostat should fail at seemingly random times for several hours at a time and for months? That was too much for me. The next day I exchanged the product for that of another manufacturer. Poor service drives customers away sustainably – availability is essential in the digital age.

Another practical example: in order to receive certain benefits, my son must declare his income to the US GOVERNMENT. To do this, he uses a smartphone app. Once a month, he logs into the application to report his earnings for the previous month. However, this (iPhone) application has a big drawback: It only works from Monday to Friday – and there only between 8 am and 5 pm. A SaaS-based online application that only works during “normal” business hours is a lot from the user’s point of view, but also rather difficult to use. Why limit the usage times for such an app? It can only be due to the fact that it is a state institution. Who should operate the app there outside business hours or fix an error …?

Admittedly, the two examples mentioned are extreme cases. But they highlight a common problem with many online applications: the operating companies determine maintenance window periods during which they regularly take their applications off the grid to perform routine maintenance and updates. Quite a few companies believe that such planned downtime is not downtime. However, little could be further from the truth. Downtime is downtime. Whether scheduled or unplanned, if your customers want to use your application and it’s not available for some reason, it’s downtime.

Without ensuring a high level of availability, you cannot run modern, digital online applications and services. Today, customers expect services to be operational when they want to use them and do not tolerate downtime. It’s bad enough if a failure affects your availability. If you plan and announce your downtime in the form of maintenance windows, customer dissatisfaction only increases further.

Thanks to the tools and services available today for modern application development, there should be no reason for a digital application to need downtime for maintenance or upgrades. Virtually any upgrade today can take place without downtime – even those that involve changes to the database schema and other data migration tasks. Maintenance tasks can also be completed while the application continues to run.

If your application actually needs maintenance windows due to a historical architecture problem, you should consider that a problem. It is technical debt that your application has and costs your company money. You should urgently address the problem. However, customers don’t care why their application doesn’t work. She’s only interested in the fact that she’s down. As your application grows and thrives, it may also become increasingly difficult to justify regular downtimes to customers. Building systems and processes that do not require maintenance windows also promotes the establishment of best practices for development, deployment and operation. We developers tend to be lazy when we have maintenance windows available.

Admittedly, designing and implementing changes without a maintenance window requires extra time and brainstorming. At the same time, however, this also promotes attention to detail. When developers need to think about the operational impact of a change, there are usually fewer operational issues. On the other hand, if they are dependent on maintenance windows, overall quality and availability suffer.

Even if you currently use easily identifiable, low-utilization time slots to shut down your application, no one can guarantee that they will continue to be available in the future. For example, an international expansion, the expansion of the product range or the expansion of the customer base can quickly lead to the fact that 24×7 availability is also mandatory for you.

A former customer of mine regularly scheduled a two-hour maintenance window once a week to perform upgrades and adjustments. The problem: The maintenance window itself already represents a significant impairment of availability. A two-hour maintenance window results in a maximum availability of 98.8%. Compared to other online applications, this is frighteningly low. For example, the Amazon S3 service guarantees 99.99% availability. The maximum downtime is 61 seconds – per week. In order for Amazon S3 to consistently comply with this SLA, it prohibits AWS from scheduling downtime for maintenance. Any failure would result in the contractual SLAs not being met.

If Amazon S3 fails for only 4.3 minutes in a given month, AWS will refund 10 percent of the storage cost for the entire month. It is therefore a considerable sum. And that doesn’t just apply to S3, but to AWS as a whole. The availability obligation is anchored in the minds of the employees. You build everything in such a way that no downtime is required, no matter what changes are involved.

It is also part of the truth that not every business needs a 99.99% uptime level to be successful. But even at a lower percentage, there is little room for planned maintenance windows, because:

  • a 99% uptime means a maximum downtime of 1.6 hours per week.
  • an availability of 99.9% means a maximum downtime of 10 minutes per week.
  • a 99.99% uptime means a maximum downtime of less than 61 seconds per week.

This post is based on an article from our US sister publication Infoworld.

*Lee Atchison is an expert in the cloud Computing and application modernization with more than 30 years of professional experience. Among other things, he writes for our US sister publication

Ready to see us in action:

More To Explore
Enable registration in settings - general
Have any project in mind?

Contact us: