The importance of SD-WAN redundancy to application uptime

Downtime is the bane of CIOs, IT managers, and sysadmins around the globe. Downtime brings productivity to a halt, costs an enterprise money, and often leaves IT in the hot seat to explain what happened. To make things tougher, the numbers on downtime are getting worse. A survey released earlier this year showed that 86% of organizations report the cost of an hour of downtime exceeds $300,000 USD, up 5% from the 2018 survey.

So what can IT do to reduce the risk of downtime? There isn’t a single, one-size-fits-all answer. Maximizing uptime requires a mix of sound security practices, infrastructure design, and monitoring. One of the cornerstones of uptime maximization is redundancy.

Redundancy is particularly important on the enterprise WAN, and getting it right can be the difference between a loss in connectivity and a crisis averted. In this piece, we’ll explore a few approaches to redundant WAN architecture, dive into the shortcomings of legacy solutions like MPLS (Multiprotocol Label Switching), and explain the advantages of cloud-based SD-WAN redundancy over legacy WAN solutions.

Approaches to WAN redundancy & failover

Topically, there are a few approaches to redundancy in the WAN. The first and most basic approach is no redundancy. While this isn’t suitable for critical production workloads, it gives us a starting point. With a “no redundancy” approach, you have the relevant hardware, software, and infrastructure to get your WAN up and running, but any failures lead to downtime.

The next step up from this would be N+1 redundancy. With N+1 architecture, you get an additional set of the required hardware, software, and infrastructure to keep your WAN online. For example, if you used 3 routers, with N+1 you’d have 4 routers, giving you an extra if one fails. Beyond N+1, we have 2N. This is simply extending N+1 to having double what you need, so everything in production could fail and you would have the capability to get back up and running on your redundant infrastructure. In the 3 router example, you’d have 6 routers with 2N.

Beyond the architectural approach to WAN redundancy, enterprises need to be concerned with the manner failover to redundant networks occur. In the context of the WAN, active-active failover entails load balancing across two active links. This helps ensure failover can occur seamlessly in the event of a loss of one of the links. Since both links are actively running, there isn’t any downtime in the event one fails. With active-passive failover, only one link is active. When active-passive failover occurs it is usually based on route or DNS convergence, which generally means a transition time, disconnection of VoIP & videoconferencing services, and downtime.

The last-mile: where MPLS comes up short

With those architectural approaches to redundancy in mind, we can begin to compare legacy MPLS redundancy to SD-WAN redundancy in the last-mile. The reason the last mile is so important is that it’s the hardest stretch of the WAN to account for. MPLS is known for a reliable middle-mile (the portion of the network that runs across provider infrastructure). The “last-mile” between the service provider and traffic destination is where many performance and connectivity issues arise.

Generally speaking, while 2N or N+1 is possible with MPLS, it cannot deliver true active-active failover. Failover is achieved by the configuration of dual paths and use of a load-balancer. In this configuration, when active-passive failover occurs it is usually based on route or DNS convergence, which generally means a transition time, disconnection of VoIP & videoconferencing services, and downtime.

Active-active failover and the benefits of cloud-based SD-WAN redundancy

Premium cloud-based SD-WAN (not appliance-based SD-WAN) with no underlying private network meets or exceeds MPLS network performance in the middle-mile thanks to a global SLA-backed private backbone. With a network of PoPs (Points of Presence) around the world connected by Tier-1 ISPs, cloud-based SD-WAN reliably delivers the middle-mile connectivity enterprises demand.

However, where the difference really begins to show is in the last-mile. While cloud-based SD-WAN can do active-passive failover, it can also be used to implement active-active failover. The reason this is possible? Intelligent cloud-native software handles the monitoring and load balancing. In many failover scenarios, network managers have to be able to account for IP address and security policy conflicts, particularly when failing over to a different ISP. Fortunately, with cloud-based SD-WAN Network Address Translation (NAT) occurs at the PoP, not the ISP. The result is a seamless transition that doesn’t create conflicts or compromise the WAN.

Taking the benefits a step further, intelligent last-mile management (ILMM) functionality with cloud-based SD-WAN enables brownout detection, helping to ensure rapid responses to network issues that don’t completely bring the network down.

Cloud-based SD-WAN helps enterprises maximize uptime

In the last-mile, rapid and seamless failover can be the difference between application downtime and continued productivity. With businesses becoming increasingly dependent on cloud services and connectivity to the public Internet, maintaining WAN uptime is becoming increasingly important. To that end, premium cloud-based SD-WAN provides an alternative to MPLS that can deliver true active-active failover for the WAN, maximizing the benefits of 2N or N+1 WAN architecture. As a result, enterprises benefit from reduced downtime and enhanced network performance.