Blog Tutorials

How to Work with Data When You Have a Target Schema

Background

What is a target schema?

When you move data to a destination, the destination may expect data in a specific schema.

The destination schema or the target schema directs and constrains the incoming data flow as you write data to an API or a database.

Database schema: Database schema refers to how the database is constructed. Database schema includes logical constraints such as table names, fields, data types, and the relationships between them.
API schema: An API allows two disparate software, services, or platforms to transact and communicate with each other via a request-response message system. An API schema is similar to a database schema, which is the blueprint of how data is constructed.

What does it mean to work with a target schema?

Data has a shape to it depending on where the data comes from. Each record has a general structure and attributes. Let’s say an e-commerce company needs to make sure the data generated from customers purchasing items on its website needs to be reflected in the company’s shipping systems. The order data might look like this: Order_Number, User_ID, Item_ID, Number_of_Items, Purchase_Date, Order_Total, User_Address, Shipping_Cost, Payment_Info, Billing_Address, etc.

To transfer this information to the shipping system, the shape of the data needs to be molded into the target schema. The API that allows the website to communicate with the shipping system needs specific pieces of order information to be transformed and organized in a certain way. For example, some data fields such as Shipping_Cost and Billing_Address might be stripped from the incoming data. Payment_Info might need to be masked or encrypted as it likely contains sensitive credit card information. The User_Address might need to be parsed and rearranged into a format that the shipping company uses. In general, the API expects the incoming data to comply with certain rules for each data record.

Similarly, moving data among systems and databases internally requires you to comply with the target schema as well. For example, the marketing department needs to transfer customer data into Google Big Query. Depending on the database’s specific requirements, some fields and data types need to be changed into a uniform standard and information must be reconstructed to fit the target logic.

The Challenge

Mapping information to the target schema can be done manually, but there are several downsides to it. First of all, mistakes are inevitable. If a single field name is entered incorrectly, the API will complain and interrupt the data inflow or the information will be stored incorrectly or incompletely in the database. Such small mistakes can be detrimental. It might result in a delivery error which may turn into a negative review and impact future sales. When multiple people are working on the same dataset, the errors become complicated and untraceable.

Another factor that slows down the integration progress is recognizing and implementing the schema requirement in the destination’s API response. The process is time-consuming and repeated for each data inflow, which results in delays in data delivery.

The Solution

The solution contains two parts: automation and standardized collaboration. A typical organization has to deal with dozens of databases and APIs. The best way to save time and avoid mistakes is automating schema recognition and saving the schema as an editable template that can be used repeatedly and shared across collaborators.

Nexla is the only product in the market that allows users to create, save, implement, share, and manage schema templates all within a few clicks. Nexla provides both automation and flexibility so that users maintain granular control and understanding over the data by helping users visualize data without enforcing compliance to the target schema using a black box.

Let’s walk through an example.

Schema template is a great time saver when building a pipeline to an API because it eliminates schema recognition and fully guides the transformation process. When your API destination is ready, you can go ahead and start with Nexla’s automatic schema template building.

First, you can find “Schema Library” under “Tools” and create a new schema template in your library. You can label the new template with the API name in the “Name” field and input the version, date, or whatever information you recognize this template by in the “Description” field.

Then, simply paste the API response or input samples that indicate the desired schema, and Nexla automatically generates a visual version of the target schema on the right side, making what the data will look like very clear. Hit the “Create” button, then you have a template saved for transforming data when it goes through the pipeline.

Next, choose the starting point of your pipeline–the source. Authenticate your credentials, import and select your dataset, then click “Create Dataset” to continue to the interface where transformation takes place.

Click the icon on the upper right corner, find and select the schema template you just created from the drop-down list.

Transform the data attributes according to the target requirements listed on the right side. Then, click the “Next” button and select the destination just like how you connected to the source. Voila, the pipeline is now ready to deliver data.

The template you just created is also helpful for building other pipelines that share the same destination. You can reuse the template with any data input, as well as annotate and modify the template in your library. The template is shareable among all collaborators to whom you give access so that no additional work is required when multiple people/pipelines are working with the same destination.

The Result

Schema templates not only automate the compliance process but also provide visual cues that let you directly see what the data you are working with should look like without reading through documentation. With Nexla, writing data to destinations no longer requires coding and tedious manual validation. Everyone who works with data can now send data anywhere with an intuitive and error-free method.

This past year, billions of records of robust, quality data have passed through Nexla to collect dinner orders, tutor availability, grocery inventories, and to keep businesses running and thriving. Nexla was named a Gartner Cool Vendor and a Strong Performer in the Gartner Peer Insight report. Request a demo and try it out for your organization today!