BLOG

Global Resiliency: Safeguarding Critical Infrastructure Amid Cloud Outages

 Miniatur
Published October 16, 2024

The recent global outages caused by CrowdStrike created a wake-up call for many organizations. Boards of companies are asking CIOs how they can mitigate the next global outage, which could cause disruption to their mission-critical applications (through no fault of their own). Governments across the world are also questioning how to prevent collateral damage from such global disruption to essential public services, especially for critical information infrastructure (CII), such as banking, transport, and healthcare.

According to a recent Gartner report on building digital resilience, the vast majority of organizations (88%) have a defined digital resilience strategy in place. However, global outages from cloud service providers (CSPs) and Software as a Service (SaaS) security providers continue to collaterally impact organizations. This demonstrates that the digital resiliency strategy in place today in most organizations may not have considered single points of failure from CSPs and SaaS security providers.

Unplanned application downtime isn’t just a breach of compliance; it can lead to dissatisfaction that could send your customers running to competitors. That means downtime can lead to loss on multiple levels, but working on safeguarding your infrastructure and applications can also lead to better compliance, a more satisfying customer experience, and at the same time, help reduce infrastructure costs.

Understanding the total cost of on-premises operations versus those in the cloud is key. The cloud can be cost effective because of its elasticity, accommodating usage surges and reducing costs with its pay-as-you-go model. If a mission-critical application deployment adopts an active-active hybrid cloud design pattern, and the lifecycle of the existing hardware is factored into the calculation, you will see a substantial cost savings—up to 75% for artificial intelligence (AI) workloads, according to Dell research.

What is global resiliency?

Global resiliency refers to organizations’ ability to withstand, adapt, and recover from global infrastructure failures and cyberattacks. It involves developing strategies, capabilities, and infrastructure to prevent, detect, respond to, and recover from global outages.

A key aspect of global resiliency is maintaining a robust infrastructure, where IT systems and networks are flexible, scalable, and capable of handling unexpected loads or failures. This is achieved by leveraging multicloud environments and maximizing the value offered by the cloud.

Equally important is maintaining the highest efficacy in cybersecurity. Strong security measures must be implemented to protect against cyber threats while ensuring data integrity and availability—without introducing single points of failure. It’s critical to recognize that many cloud-based cybersecurity SaaS solutions are architecturally single points of failure, meaning when they go down, their customers suffer collateral impacts.

Adaptable processes are also essential for global resiliency. Businesses need to develop flexible operational workflows that can swiftly adjust to changes in the environment, market conditions, or technology. This ensures the ability to pivot quickly in the face of new challenges.

What can enterprises do to be more globally resilient?

Enterprises need to focus on three key actions when architecting globally resilient applications.

  1. Categorize applications into four tiers
    Start by identifying and categorizing your applications into the following tiers:

    • Mission-critical applications: Require global resiliency, ensuring they are always operational, no matter the circumstances.
    • Business-critical applications: Global resiliency is optional but recommended to reduce disruptions.
    • Business-operational applications: Maintain regular operations but do not require global resiliency.
    • Administrative applications: Non-essential applications that support business functions but aren’t pivotal for immediate continuity.
  2. Map global resiliency design patterns to each application tier
    Depending on the tier, enterprises can implement different resiliency patterns:

    • Distributed deployment:
      • Tiered hybrid: Front-end applications are deployed in the cloud, while existing back-end systems remain on-premises.
      • Partitioned hybrid: Combines public cloud and on-premises in an active-active deployment, providing resiliency against single-site failures and optimizing costs.
      • Analytics hybrid: Separates online transaction processing (OLTP) and online analytical processing (OLAP) tasks, allowing the public cloud to handle complex analytics while maintaining core operations on-premises.
      • Edge hybrid: Manages time-sensitive, business-critical workloads locally (e.g. AI inference at the network's edge) while using cloud/on-prem for other tasks.
         
    • Redundant deployment:
      • Redundant pattern: Distributes workloads across different clouds or environments based on production and development needs.
      • Business continuity hybrid pattern: Utilizes public cloud failover for cost-effective cold standby systems.
      • Cloud bursting pattern: Handles baseline workloads privately and bursts to the cloud for extra capacity when needed.
  3. Tailor global resiliency reference architectures to each tier
    Enterprises should establish a reference architecture based on these application tiers. This serves as a strategic guide for deploying both existing and new workloads, shortening time-to-value and aligning technical and business resiliency needs. For mission-critical applications, a "partitioned hybrid" design pattern is essential. This means deploying the same frontend in both on-premises and cloud environments to ensure resiliency against single-site failures. By following these steps, businesses can not only safeguard their operations but also gain the flexibility needed to thrive in a global, cloud-driven environment.
Global resiliency can be enhanced by tailoring resiliency reference architectures to each tier.
Global resiliency can be enhanced by tailoring resiliency reference architectures to each tier.

Are there frameworks on global resiliency?

There are several frameworks and models for digital resiliency that organizations can adopt to enhance their ability to respond to and recover from disruptions. Some of the notable frameworks include:

  • National Institute of Standards and Technology (NIST) Cybersecurity Framework
  • International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 27001
  • Control Objectives for Information and Related Technologies (COBIT)
  • Information Technology Infrastructure Library (ITIL)
  • Business Continuity Management (BCM) Framework
  • Digital Operational Resilience Act (DORA)
  • Capability Maturity Model Integration (CMMI)

By adopting these frameworks, organizations can create a structured approach to enhance their digital resiliency and better prepare for potential disruptions.

Key strategies for global resiliency

Ensuring global resiliency requires high availability, scalability, and robust security for applications. Organizations can achieve this by leveraging key technologies that boost both performance and protection.

ADCs: F5's BIG-IP app delivery controller (ADC), F5 NGINX ADC, and distributed cloud Application Delivery Controller as a Service (ADCaaS) can optimize traffic distribution and scale applications across data centers, clouds, and hybrid environments to ensure availability and performance.

Cybersecurity: Tools like web application firewalls (WAFs), application programming interface (API) security, and denial-of-service (DoS) protection safeguard applications from cyber threats, ensuring continuity even during attacks.

Cloud and hybrid deployments: Multicloud networking and hybrid setups improve flexibility, enabling swift response to disruptions.

Automation and orchestration: Automating application delivery and security reduces errors and reduces response times, which enhances resiliency.

Visibility and analytics: Real-time monitoring and analytics allow proactive responses to performance issues and security threats.

By implementing these technologies, organizations can ensure their applications remain available, scalable, and secure in an ever-changing digital environment.

Building a comprehensive global resiliency strategy

In today's interconnected world, building global resiliency is crucial for maintaining the integrity of applications. By focusing on key areas like application delivery, robust cybersecurity, and adaptable cloud strategies, organizations can better protect their services from disruptions and scale to meet evolving demands. Implementing automation and gaining real-time visibility into system performance can further strengthen resiliency efforts. With a comprehensive, well-thought-out approach, businesses can ensure that their applications remain reliable, secure, and ready to meet the challenges of tomorrow.

Chat with us at GovWare in Singapore from October 15-17 at Sands Expo Convention Center at Booth P06, where we will share insights on how you can build and strengthen cyber and cloud resiliency, and secure, deliver, and optimize apps anywhere.