Blog Data Fabric

How to Move Data from SAP Hana to Snowflake

Why Move Data from SAP to Snowflake?

A lot of enterprises have built their business on SAP. SAP is a solid foundation, but as the amount of data-hungry applications grows, the transactional nature of SAP can lead to delays. There are also the concerns about who in the organization needs access to the data stored in SAP, such as data analysts and data scientists, and which non-transactional applications shouldn’t be provided access at all.

A more convenient solution is to move the data out of SAP and into a data warehouse. In the on-prem world, this data warehouse can be a SAP product or Teradata/Exadata, etc. With the advent of cloud data warehouses like Google Bigquery, Amazon Redshift, Azure Synapse, and Snowflake, it has become increasingly easy to house this data in multiple ways and places.

The issue then becomes moving data between SAP and your data warehouse or database and keeping everything synched. This article covers how to sync data between SAP and a cloud data warehouse; in this example, we are using Snowflake but any other cloud data warehouse would follow the same procedure.

The journey to a cloud data warehouse can be complex and should be broken down into bite-sized pieces.

Step 1: Analyze – What needs to get to Snowflake?

Which databases & tables need to be made available?
Which users, roles, and applications have access to these databases and tables?
How is data getting loaded into these tables – which scripts and applications are responsible for the data load?
How often is the data updated in these tables?
What are the consumption patterns of this data?

Document your answers and use this list to evaluate the level of support for these inputs and outputs in Snowflake. Additionally, use this list as an input to the QA process.

Step 2: Low Level Design/Execution Plan

Now that you have an understanding and have documented your current requirements, let’s lay out an execution plan for the data movement.

Instead of an all-at-once approach to syncing, we recommend a phased approach. Move low-impact tables, databases, and applications first before moving on to the complex syncing tasks.

No matter which approach you take, by the end of this step you should have a plan to sync the data.

Take the output from the previous Analysis step and divide up the tables and databases into logical phases. Phase 1 should typically involve tables that need minimal changes and are low-impact to your business needs.
It is best to plan a full vertical slice – i.e. end-to-end ingestion, data movement, and consumption together. This helps isolate issues at every stage early in the process.
Identify tools that can help speed up the process. It is generally not recommended to hand-code.
Try out DataOps tools like Nexla for executing the process. They can reduce time to market substantially by automating a significant portion of the syncing and re-tooling. This becomes especially beneficial for repeatable steps in the phased approach.

Step 3: Set Up Hana & Snowflake Account

With your plan of action in hand, let’s get started executing it. Firstly, you’ll want to set up your Hana and Snowflake account to meet your needs.

Use the Snowflake UI/CLI to configure the following on Snowflake:

Create databases and warehouses on Snowflake.
Create users and accounts on Snowflake.

Step 4: Build Data Extractor from SAP

SAP supports connection via APIs and ODBC/JDBC drivers and you can write code in your favorite programming language and extract data from SAP. When extracting data, it is best if all custom fields are extracted and type information is preserved as type information helps to create tables in Snowflake later. It is best to use a typed format to store this data, so try to avoid CSVs in favor of JSON/AVRO formats.

Step 5: Create Snowflake Tables

This step involves creating the Snowflake table for the extracted data. It is best to map SAP field types to Snowflake field types. Having a typed format in Step 4 makes this really easy, though you may still need to rename columns not adhering to the Snowflake column naming conventions

Step 6: Load into Snowflake

Use the snowflake COPY command to bulk load the files created in Step 4 into Snowflake.

Step 7: Do it on Schedule

Integrate a scheduler (Quartz, Airflow, etc) to run steps 1-6 on the desired frequency.

Step 8: Optimizations

Now that your data has been successfully loaded into Snowflake, you may be wondering about optimizing the process. How can you save time and automate this? What if you could update only things that had been changed, or make it easier for more applications to access SAP?

Delta Loads

One solution is using deltas to load in data. You can snapshot data once from SAP and load it into Snowflake so you only have to load deltas going forward. This does require you to remember the last loaded row from SAP Hana, though; simply follow step 6 of this guide again with the new data.

APIs to Snowflake

Providing access to SAP can be a reason to hesitate, but providing access to Snowflake is much easier. You can build APIs to Snowflake to provide access to other applications or share tables with them.

Can I use a tool?

Naturally, there are tools to automate and improve the process. Nexla makes it easy to perform all these steps in a few clicks. Simply add your credentials to connect to SAP/Snowflake, select your tables, and you’re done. Some Nexla customers have synced data between SAP Hana and Snowflake in less than 10 minutes.

To learn more about how Nexla can save you time and accelerate your time-to-value, request a free demo today!

Tags: Data Integration