Unlocking Data Value Series Blog #2: Data Integration and Modeling with SAP Datasphere

This blog post is part of a blog series where we explore how to leverage SAP BTP Data and Analytics capabilities to unlock the full potential of your data. In this series, we will take you on an end-to-end journey through a real-world scenario, demonstrating how SAP Datasphere, SAP Analytics Cloud, and SAP Data Intelligence Cloud can empower your data-driven solutions.

The blog series consists of four blog posts:

Unlocking Data Value #1: SAP BTP Data and Analytics overview: Learn about our objectives, SAP products, and real-world use case.
Unlocking Data Value #2: Data Integration and Modeling with SAP Datasphere: (Current) Explore integration options and data modeling using Data Builder and Business Builder.
Unlocking Data Value #3: Machine Learning with SAP: Discover SAP HANA ML and Data Intelligence for building and operationalizing machine learning models.
Unlocking Data Value #4: Data visualization with SAP Analytics Cloud (Coming soon): Dive into Business Intelligence and Data Connectivity.

The full content of this series was delivered in live sessions to SAP Partners. It was built in cooperation with Thiago de Castro Mendes Cesare Calabria , Mostafa SHARAF , Dayanand Karalkar , Ashish Aitiwadekar and Andrii Rzhaksynskyi .
You can find complete sessions’ recordings here.

Data Integration options in SAP Datasphere:

SAP Datasphere provides a range of data integration options. Let’s explore these alternatives :

Virtual Access – SAP Datasphere offers Remote Table Federation, which enables real-time data retrieval from the source system without the need to transfer it to SAP Datasphere. This feature is compatible with various sources and hyperscaler solutions.
Persistence – There are multiple ways to achieve persistence in SAP Datasphere. Remote Tables Replication allows for real-time replication or snapshot creation using single entities. Replication Flow enables replication involving multiple entities and flexible targets. Additionally, views can be materialized to persist the output results.
Data Flow – SAP Datasphere supports the extraction, transformation, and loading (ETL) of data. It provides advanced transformation capabilities and supports Python scripts for enhanced data processing.
External Tools – In addition to its native features, SAP Datasphere allows the use of external tools like SAP Data Services and SAP Data Intelligence. It also supports the import of data from other vendors’ tools using the open SQL schema.

Data integration options in SAP Datasphere

What are the available data source connections supported by the system?

– For SAP Applications, the system offers support for both on-premises and public cloud-based systems, including S/4HANA, SAP Business Suite, SAP ECC, and other ABAP-based systems. Dedicated connectors are available for SAP BW and SAP BW/4HANA. Additionally, connectivity to Hana Data Lake capabilities and various cloud-based Line of Business (LoB) solutions is provided.

– Database Connections: The system allows for connections to different databases, such as SAP HANA, Oracle, SQL Server, and generic databases using JDBC. It also supports other generic connections like oData, SFTP, CDI, or CSV.

– Hyperscalers and Cloud Applications: Connectivity is provided to major hyperscaler storage services like Google Cloud Storage, Azure Data Lakes, and Amazon S3. Integration with cloud applications such as Google BigQuery, Azure SQL, Amazon Athena, Redshift, and open connectors is also supported.

– External Data Integration: It is possible to seamlessly integrate external data sources with SAP Datasphere by writing data into the system as explained previously.

Supported Data Source Connectivities in SAP Datasphere

Data Modeling in SAP Datasphere:

Data Modeling Layers in SAP Datasphere

SAP Datasphere offers a range of modeling capabilities to cater to different types of users. It provides integrated No Code/Low Code editors (Business Builder and data Builder) for Business Analysts, and also accommodates external editors for developers and power users.

For the integrated editors:

The Business Builder is a key feature that enables self-service modeling. It allows business users to create their own business models using business language, regardless of the data integration layer they are working with. it allows to Build a stable semantic layer for all consumers and Flexibly be able to change the underlying Data Layer when needed without disrupting the consumers
On the other hand, the Data Builder is primarily used by IT users and provides both graphical and scripted views. This component is essential for modeling foundational data sources and tables. Developers and modelers utilize tables, views, and intelligent lookups in the Data Builder to combine, clean, and prepare the data.

The Data Builder:

Data Builder editors in SAP Datasphere

Data Builder is the central hub for data modeling in SAP Datasphere, providing a variety of editors to meet different requirements.

The Table Editor allows the creation and maintenance of local tables, editing of remote tables, definition of semantic usage, and management of columns, fields, attributes, measures, and hierarchies, among other features.
The Graphical View Editor enables the combination of local and remote datasets into a single data model, incorporating filters, associations, joins, unions, and other operators as necessary.
The SQL View Editor offers similar capabilities using the SQL language.
The Entity Relationship Editor facilitates the creation of entity relationship models, organizing local tables and views, and establishing relationships between them.
The Analytic Model Editor is specifically designed for defining multi-dimensional models for analytical purposes, enabling the modeling of cube-like structures with facts and dimensions.
The Data Flow Editor allows the modeling of data flows into SAP Datasphere, including complex transformations.
The Intelligent Lookup Editor allows the merging of data from two entities, even when there is no common joining column.
The Replication Flow Editor is useful for quickly and easily copying multiple datasets from the same source to the same target, without requiring complex transformations.
The Transformation Flow Editor allows the loading of data from one or more source tables, apply transformations, and output the result in a target table. You can also load delta changes (including deleted records) from one source table to a target table.
Tasks can be grouped into task chains and executed manually or scheduled periodically.

Bookshop Storyline Implementation example:

Returning to our bookshop use case that we’ve been discussing in this workshop and explained in the first blog post.

Our data is distributed across three different sources:

The bookshop employs SAP S/4HANA Cloud to manage their sales process, housing all current book sales data.
SAP HANA Cloud is used to store master data, such as book details, authors, and genres.
Lastly, historical sales data is archived in an external Amazon S3 repository.

The objective is to extract meaningful insights from this distributed data to help optimize and boost book sales. To achieve this, the bookshop owner needs comprehensive insights into book sales activities, an understanding of sales trends, and effective book recommendation strategies for customers.

To unlock the data value for our bookshop, we will utilize the SAP Business Technology Platform’s Data and Analytics capabilities, specifically SAP Datasphere, SAP Data Intelligence Cloud, and SAP Analytics Cloud.

Our initial step was to establish connections to the data sources through SAP Datasphere and SAP Data Intelligence Cloud. This process was relatively straightforward, as connectors to the bookshop’s systems already exist. We simply needed to input the system details and credentials.
For the S/4HANA connection, we created a communication arrangement corresponding to the SAP_COM_0532 scenario, suitable for our case.

Data Source connectivities in the context of the Bookshop use case

To create stories in SAP Analytics Cloud based on the bookshop data, we needed to create an Analytic Model that encapsulates all the sales data for consumption in SAC. Before creating the Analytic Model, we first had to extract the data from the source systems and prepare the dimensions and facts.

Analytic Model Creation Steps in SAP Datasphere

Data modeling steps in SAP Datasphere for the Bookshop use case

Here is a breakdown of the steps involved:

1- Book Author View:The initial step was to create a view that joins Author and Book data and stores it in a single local view. This was accomplished using the graphical editor, which allows for drag and drop of the source table from the SAP Hana connection and includes the necessary join and projection operators.

Book Author View creation with the Graphical Editor

Here is a demo of the Book Author View creation:

2- Create a Data Flow:We utilized a data flow to retrieve and transform the historic sales data from Amazon S3. The process was straightforward: we selected the source sales file (in our case, a CSV), adjusted the data types for the columns, and stored the result in a local table. Finally, we deployed the table and executed the flow to retrieve the data into the local table

Integrating Archived Data from Amazon S3 using the Data Flow

Here is a demo of the Data Flow creation:

3- Create a Replication Flow:The actual sales data resides in S/4HANA Cloud, and we wanted to retrieve this data using the CDS extraction APIs. In our case, no transformation was required; we only needed to filter on the document type to obtain the book sales records. This presented an opportunity to use the replication flow to achieve one-to-one replication into a target local table.

Here is a demo of the Replication Flow creation:

4- Create the Fact:Next, we created a view that unions the previously replicated book sales tables, which contain both current and archived sales orders. We selected a Semantic Usage of Fact to indicate that the entity contains numerical measures that can be analyzed and used in the final Analytic Model. We specified the measures, attributes, and set the OrderID as a key. Additionally, we created associations to link to the time dimension and the initially created Book Author Dimension. To make the view available for consumption outside of SAP Datasphere, we enabled the Expose for Consumption property.

Prepare Bookshop Data in a single view with Semantic Usage of Fact

Here is a demo of the Fact creation:

5- Create the Analytic Model:Analytic models are the foundation for making data ready for consumption in SAP Analytics Cloud. We set the Semantic Usage of the final view to Fact and add it to an analytic model to expose it.The analytic model offers more calculations and functionality compared to Analytical Datasets. It allows you to remove unnecessary calculations, offers calculated measures, restricted measures, and an analytical preview.The creation of the analytic model can be triggered from the fact or from the data builder menu, we then selected the fact, included all the measures and attributes, and generated the model. We could then see the measures, dimensions, and their associated attributes that can be used in SAP Analytics Cloud. Finally, we could save, deploy, and preview the model’s data.

Analytic Model Creation on top of the previously created Fact

Here is a demo of the Analytic Model creation:

Data Consumption:

In SAP Datasphere, the data consumption possibilities can be broadly categorized into the following four groups:

– SAP Datasphere seamlessly integrates with SAP Analytics Cloud for direct consumption of models. Multiple SAP Datasphere systems can be connected to multiple SAC systems.

– Two possibilities for the MS Office Integration: Live connection support for SAP Datasphere via the SAP Analytics Cloud, add-in for Microsoft Office As well as Live connection support for SAP Datasphere via the Excel Plug-In

– Additionally, SAP Datasphere enables read access for external consumption tools or applications via a database user with read privileges using SQL Access. It also enables consumption through third-party BI tools and apps.

– You can leverage SAP’s business expertise and pre-built data models, semantic views, and transformation logic through the Business Content Collection, which continuously grows with new content packages.

Data Consumption options from SAP datasphere

In the next blog post, we will explore how the data modeled in Datasphere is consumed in SAP Data Intelligence Cloud for machine learning purposes. The final blog of the series will focus on SAC, where we will build the story based on the Analytic Model created in SAP Datasphere.

More Information:

Check out the product page on sap.com
Get started with our Learning Journey
Use the tutorials to learn more
Join the Community and check out the Best Practices recommendations
Online Documentation on SAP Help
Replication flow: https://blogs.sap.com/2023/11/16/replication-flow-blog-series-part-1-overview/