SAP BW HANA to Databricks via SAP DI – I

Background :

The announcement of partnership between SAP and Databricks has been well perceived by the different communities considering the fact both technologies are very widely used and needs to have better integration between them.

 

This supports better integration between SAP Datasphere and Databricks. Having said that, large members of Teams still needs to figure out how to effectively connect existing SAP BW (On premise/cloud ) and Databricks. This blog aims to address this pain point from our practical experience.

 

High level end to end Architecture and requirement

Aim was to extract the data from SAP BW to Databricks , run highly compute intensive calculations in databricks and bring the millions of data records (more than 200 Mil) back to SAP BW in highly efficient manner to address the tight reporting SLA’s. In addition whole process needs to happen in fully automated manner without any manual intervention. Options such as Mulesoft were evaluated however post lengthy analysis, we concluded that for Data integration and as per the current architecture landscape, SAP DI will serve the purpose better.

end-to-end%20Flow

end-to-end Flow

 

BW to Databricks Flow

How data is picked from BW and sent to Databricks :

Source of the data are SAP providers. Data from SAP providers is extracted via HANA calculation views as it is efficient way to extract the data from SAP Data intelligence point of view.

Data intelligence performs the task of middleware to pull the data from SAP BW and send to Azure landing-Bronze layer.

In SAP BW, a custom program to trigger Data Intelligence graphs can be developed to automatically trigger DI graphs once load of the data in the source providers complete.  Program uses HTTP connection to trigger DI API (/app/pipeline-modeler/service/v1/runtime/graphs) to trigger the respective graphs.

On target side, it uses standard operator to write into the Azure blob storage.

Once data is landed in the blob, databricks provided API (/api/2.1/jobs/run-now) is triggered to start the workflow that can load the data all the way to the Gold layer automatically.

 

Databricks%20Layers

Databricks Layers

Once data is in the gold layer, further required calculations are triggered. If the workspace of Gold layer and Calculation layer is  different , a notebook in the source job can be setup to trigger calculation job/workflow via API(/api/2.1/jobs/run-now). Post completion of calculations , data can be parked in the delta table and status tables (for DI )are updated.

 

Gold%20to%20calculation%20workspace

Gold to calculation workspace

Databricks to BW flow, I will publish soon.

 

Scroll to Top