updated date: 31.Oct.2023

This blog is part of the Business Continuity with RISE and BTP blog series:
part 1 – Concept Explained 👈
part 2 – Technical Building Blocks in RISE
part 3 – Technical Building Blocks in BTP

Nowadays more and more enterprises have digitalized their business with enterprise solutions (like SAP business applications and platforms). Business continuity with enterprise solutions is the level of readiness of a business to maintain critical functions after an emergency or disruption.

When companies are thinking of their enterprise solutions in the cloud (hereby, the most dominant cloud infrastructure providers are Azure, AWS, and GCP), despite the benefit of lower TCO and less operational overhead, it also gives the opportunity to redesign the business continuity for their business-critical workloads, to ensure the availability of data, applications, and platform being prepared for the potential waves of disruptions.

Therefore, in this blog series, we come back to the fundamentals, and focus on answering the questions of, ‘What is?’ and ‘What makes it possible’.

1. Service Model Explained

On-Premises Model	provider provide: software license customer manage: Networking, Storage, Servers, Virtualisation, O/S, Middleware, Runtime, Data, Applications this used to be the service mode, SAP provided enterprise softwares
Infrastructure as a Service (IaaS)	customer manage: O/S, Middleware, Runtime, Data, Applications the dominant IaaS provides in the market are: AWS, Azure, GCP
Platform as a Service (PaaS)	customer manage: Data, Applications examples: SAP BTP (including SAP Analytics Cloud, Datasphere)
Software as a Service (SaaS)	fully managed service example: SAP Concur, SAP Ariba, SAP Successfactors, SAP Fieldglass RISE with SAP, Private Cloud Edition is a SaaS-like managed service

2. Business Continuity Explained

Business continuity is mainly ensured by 4 key parts: High Availability, Disaster Recovery, Data Management, Change Management.

High Availability is about recovery from single entities’ errors, typically a broken server or network switch. HA is measured by Service Level Agreement (SLA). The higher SLA systems are been promised, the less downtime they will suffer.

Disaster Recovery represents the resilience of the application / system foundation after entire system failures, in case of catastrophic events like earthquakes or floods. DR is measured by RTO and RPO. Disaster Recovery can be built as In-Region (2 Availability Zone within 1 Region) or Cross-Region (2 Regions).

Data Management includes Data Backup, Data Replication, Snapshot, and Data Restore. Details see section 3.4.

Change Management avoids the application / system impairments caused by changes (code change, or system upgrade) on a previously working state. With standardised and automated pipeline, changes can be better governed and regulated. In section 3.5, we will talk about automation in change management.

3. Key Technical Components

3.1. Trustworthy infrastructure

RISE with SAP Private Cloud Edition is built on top of trustworthy infrastructure provided by Azure, AWS, and GCP, who are the leaders in Cloud Infrastructure and Platform Services (according to Gartner Magic Quadrant).

Hyperscalers (Azure, AWS, GCP) group Data Centers with Cloud Computing resources, into Availability Zones, then several availability zones within certain close physical distances can compose a Region.

Microsoft Azure (Azure)	Azure Global Infrastructure and Locations Region, Availability Zone
Amazon Web Services (AWS)	AWS Global Infrastructure and Locations Regions and Availability Zones
Google Cloud (GCP)	Google Cloud Global Infrastructure and Locations Regions and Zones

3.2. Monitoring, Failover, and Load Balancing

Monitoring refers to the practice of continuously monitoring the various components and systems involved in a high availability (HA) architecture. Effective monitoring in a high availability environment serves for several purposes: early detection of failures, proactive maintenance, performance optimization, SLA compliance.

Failover in high availability refers to the process of automatically switching from a failed or degraded primary system to a secondary system in order to maintain uninterrupted service. It is a key component of high availability architecture, which aims to minimize downtime and ensure continuous operation of critical systems.

Load Balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. A Load Balancer is an entity of load balancing mechanism which applies monitoring and failover. With load balancers, the primary component failure and be detected, and then the traffic can be redirected to the redundant component.

3.3. Compute Redundancy and Clustering

Redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system.

Clustering (also known as high-availability clusters, or fail-over clusters), is a mechanism to group several resources for similar purpose as nodes into one cluster. Thereby, the cluster can be seen as one unit when accessing the resources for the similar purpose. And the nodes within one cluster can potentially be used for load balancing or failover, one to another, in case any availability or scalability scenarios took place.

3.4. Data Redundancy (Backup, Snapshot, Replication, Recovery)

Data Backup is to create the redundancy of the compute data, so that it may be used to restore the original after a data loss event.

Snapshot is a method to create backup, and is mostly on at the volume level.

Data Replication has 2 categories: Streaming Replication (can be synchronous or asynchronous) and Backup Replication. Streaming Replication is transaction-based can be at disk-level, OS-level, Database-level, or Application-level. Backup Replication can be Snapshot-based (more efficient), or treating backups as File-based.

Data Recovery is the process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from data backup.

3.5. Automation via IaC and CI/CD

Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through manual processes. IaC helps automate infrastructure change. The benefit of having IaC includes: Cost reduction, Increase in speed of deployments, Reduce errors, Improve infrastructure consistency, Eliminate configuration drift. IaC can be integrated into CI/CD pipeline.

CI/CD is the key part of DevOps practice, and greatly automates processes in software development. Though having frequent changes may increase the possibility of impairments, having accumulative changes into one can be more detrimental (due to complexity of the change). Therefore, it is more recommended to have small and frequent changes, with governance – Continues Integration/Continues Delivery (CI/CD).

Disclaimer

The blog content does not necessarily represent the official opinion of SAP, Microsoft, Amazon Web Services, or Google Cloud. The opinions appearing in this blog are backed by SAP, Azure, AWS, GCP documentation which can be revealed in the corresponding reference links.
The blog content is only focusing on technical discussion, hence can not be used as commercial basis, nor should be used as SAP official offering documentation.

Acknowledgment to contributors/reviewers/advisors:

Ke Ma (a.k.a. Mark), author, Senior Cloud Architect, RISE Cloud Advisory RA group

Special THANK YOU to RISE with SAP community members, who contributed to this blog:

Ferry Mulyadi, Partner Solution Architect, Amazon Web Services

Micah Waldman, Product Management Lead, Google Cloud Business Continuity

Thorsten Staerk, Customer Engineer, Google Cloud

Frank Gong, Digital Customer Engagement Manager, SAP ECS

Marc Koderer, Chief Architect, SAP ECS

Boris Maeck, Head of Technology and Architecture, SAP ECS

Chun Yuan, DevOps Engineer, SAP BTP Cloud Foundry Platform

Zabala Silvestre, Product Owner, SAP BTP Cloud Foundry Platform

Aaron Smyth, Principle Service Architect, SAP

Sven Bedorf, Head of Cloud Architecture & Advisory, RISE Cloud Advisory, MEE

Kevin Flanagan, Head of Cloud Architecture & Advisory, RISE Cloud Advisory, EMEA North

Luc DUCOIN, Cloud Architect & Advisor Expert, RISE Cloud Advisory, EMEA North

Richard Traut, Head of Cloud Architecture & Advisory, RISE Cloud Advisory, EMEA North

Peter van den Berg, Cloud Architect & Advisor Expert, RISE Cloud Advisory, MEE

Extended Reading: 
Reliability Pillar, from AWS Well-Architected Framework 
AWS Prescriptive Guidance - Resilience lifecycle framework, by AWS 
Disaster Recovery of Workloads on AWS, from AWS Well-Architected Framework 

Some more previous blogs:
DNS integration with SAP RISE in multi-cloud environment series guide – Azure
DNS integration with SAP RISE in multi-cloud environment series guide – AWS
DNS integration with SAP RISE in multi-cloud environment series guide – GCP
Harmonized Single Sign-On for SAP RISE Customers in Multi-Cloud Environment
Demystify Single Sign-On on Server Side for SAP RISE Customers
empower SAP RISE enterprise users with Azure OpenAI in multi-cloud environment
Unlock the Power of Business Data for SAP RISE Customers: Mastering Data Management and Cultivating Insights
Extend the Power of Data for SAP RISE Customers: data federation with SAP in multi-cloud GCP
Extend the Power of Data for SAP RISE Customers: data federation with SAP in multi-cloud AWS
Extend the Power of Data for SAP RISE Customers: data federation with SAP in multi-cloud Azure