Blog & Events

What is RTO / RPO Redefined: The Race to Zero.

Oct 3, 2018, 09:15 AM by Trenton Baker

RTO RPO Redefined
  • Question #1: Is it technically possible to get your Recovery Time Objective (RTO) and Recovery Point Objective (RPO) to zero or near-zero?
    • Answer: Yes.

  • Question #2: Do you really want to?
    • Answer: Maybe not.

Wait, why shouldn’t you get RTO and RPO to zero or near-zero?


Understanding RPO and RTO

Well run IT organizations utilize data protection best practices to evaluate the risk of data loss and establish IT Resilience (ITR) policies to ensure business continuity. CIOs and IT managers must start with a common understanding of what is RPO and RTO in storage, regarding backup and disaster recovery.

Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are two of the most critical parameters of a data protection plan and disaster recovery strategy. These measurements are related and necessary to application and data availability. Despite their similarities, RPO and RTO serve different purposes and come with different metrics.

  • Recovery Point Objective = Data Risk. RPO refers to the maximum acceptable amount of data loss an application can undergo before causing measurable harm to the business.
  • Recovery Time Objective = Downtime. RTO states how much downtime an application experiences before there is a measurable business loss.

Understanding RPO and RTO
Courtesy of Enterprise Storage by Christine Taylor

Recovery Objectives

Everyone knows that email is a business-critical application that can only be unavailable for an hour or more before most businesses will record productivity loss. (Employee complaints are another story.) However, a customer transaction database may be unavailable for under two minutes before demonstrating financial and reputational damage.

  • RPO Example: If the last available copy of data during an outage is from 18 hours ago, and the RPO for this business is 20 hours, then we are still within the parameters of the IT resilience policy’s RPO. The RPO then answers the question – “Up to what point in time could the recovery proceed acceptably, given the volume of data lost during that period?”
  • RTO Example: Providing that your RTO is five hours, meaning your business can survive downtime for this interval, then your ITR policy will need to ensure high levels of preparation to ensure that systems can be recovered quickly. Conversely, if the RTO is two weeks, then different data protection plans can be developed to reach data availability.

It is best practice need to match recovery time and point objectives (RTPO) to the application priority. For rare mission-critical applications, minimizing risk may require zero/near–zero RPOs and RTOs – despite the expense.


Why Zero RTPO Costs

The best way to achieve zero/near-zero RPO and RTO is synchronous mirroring. It works by synchronously writing I/O from the primary storage media to a second mirrored system, and waiting on acknowledgment before writing the next I/O set from primary to the mirrored system. The secondary copy is then stored in an active state for immediate recovery, think high-availability (HA) in a dual-node clustered server.

This processing-intensive configuration needs high-performance storage systems and maximum bandwidth to minimize performance impacts, which adds management, time, and expense. Additional layers such as database systems, clustering hardware and software, and native database replication features add more cost and complexity. Each layer requires IT experts to configure, integrate, and manage within the RTO/RPO infrastructure.

Achieving zero/near-zero in this environment is possible but not easy. Still, synchronous replication is the best option for high transactional, mission-critical applications because it does not require data movement, rehydration, or waiting. It’s highly successful and expensive means of achieving zero/near-zero recovery objectives.

Instead of Zero, How about 15 Minutes?

What about business-critical applications then? It’s overkill to pay for zero/near-zero RPO and RTO for most applications. However, the average length of recovery objectives is far too long for business-critical applications.

Let’s look at two possibilities that should cover most business applications: zero/near-zero RPO and RTO for the most mission-critical applications, and RTP and RPO of just 15 minutes for everything else that needs short objective times.

If you can get that RPO/RTO down to 1 hour at the slowest speeds or 15 minutes at highest, you will achieve quick recovery times and posts for most of your applications. According to George Crump of Storage Switzerland, the “secret” is in-place recoveries for RTO and change block backups for RPO.

  • RTO: In-Place Recovery. In-place recovery features work by quickly recovering backed-up data from backup devices, without having to reconstitute data from the backup format. Bandwidth can be a stumbling block and will determine how fast this works, so be sure that your infrastructure can support remote or cloud backup. You’ll achieve the fastest RTO by caching active backup data on-premises for in-place recovery.
  • RPO: Change Block Backups. By backing up changed blocks only, you can practice frequent backup without impacting network performance. This means the amount of data between backups is minimal, allowing for much lower RPOs.

You will still want to assign a frequency of block backups and decide on priority data to cache, so not every RPO or RTO will be 15 minutes or less. However, you certainly can achieve 15 minutes with the applications that need them, and an hour or less for the rest at a very reasonable price.


RTO and RPO graph Courtesy of Veeam.

How much time and data can a business save? A Veeam survey on RPO and RTO, “Using Veeam in the New Race to Zero: Customer Survey Results”, reported that respondents reduced restore time by 77%, and saved 10.2 hours per year of downtime in their virtual environments. According to IDC, enterprises average $100,000 cost per downtime hour. Veeam’s customer survey demonstrates that new RPO and RTO technologies deliver better recovery objectives at a much lower cost.


RTPO next steps

There are on-demand clouds that offer-up a DIY environment, and then there are custom cloud service providers that can do it all — without the hidden costs. One indication that the vendor does it all is that they are willing to provide DRaaS and custom SLAs to meet an IT resilience policy. If your data backup and recovery cloud cannot do that, then you need to revisit your data protection plan to obtain a solution custom crafted to provide measurable, trustworthy, and repeatable RPO and RTO metrics.

The KeepItSafe cloud is in rare air in its ability to provide the support, expertise, and agility required to deliver cloud backup, DRaaS, and IT Resilience. Whether you are a small business, a mid-market company, or a Fortune 1000 enterprise, KeepItSafe has the backup and recovery features that will meet your specific needs. And best of all, you can experience all the cost-saving features and time-saving benefits of KeepItSafe by starting with a 30-day trial.


Readers of this blog post are also interested in this webinar:

How to Create a Great Disaster Recovery Plan

Get Your FREE
Market Analysis Brief!


Disaster Recovery Planning

Download a free Planning Guide

“Storage Switzerland details DR Planning from Good to Great”