Automating  workflows from emails to data warehouse

Automating workflows from emails to data warehouse

The Problem:

A customer’s workflows relied on incoming data from partners, but the partners were only able to share data in emails as excel attachments. The customer wanted an automated workflow where email receipt would immediately trigger ingestion, modification, and delivery of the data from the attachments to their Redshift.
Effectively they needed the string of emails to become an automated pipeline into Redshift.

But the delivery mechanism of multi-tab excel attachments that could come at any time of the day posed several challenges for the customer. Apart from having a generic solution that could reliably extract valuable data from each sheet of each attachment, supporting each new integrations would incur considerable resource and time costs. Additionally, since email generation was outside their control, handling common issues like change in file structure, missing emails, and data processing errors would be a considerable challenge.

The Solution:

The customer’s analyst team had been using Nexla to manage some of their most complex external data feeds and knew that Nexla’s core platform is highly flexible in handling unique data types. Particularly relevant was the ability to extract datasets from multi-tab excel files which may have formulas, merged cells, or often have data and text content mixed together.When posed with this use case, true to our promise of powering “Intelligent, inter-company data pipelines”, we spun out an email connector that delivers both simplicity and scalability.

Setting up the source required the analyst to execute two simple steps:

    • On Nexla UI, they generated a unique email id for their email source. This required no prior information about what type of emails would be forwarded to this email.
    • On their email client, they would up a simple forward rule to send all emails to this unique email id.

That’s it! Once setup Nexla started monitoring any incoming emails to that email address, extracting data from the attachments, and detecting whether the data mapped into a new dataset or an existing one. The datasets then showed up on Nexla UI, ready to be mapped into a data warehouse as an automated pipeline.

Email Pipeline

As Nexla harmonizes datasets no matter what the source and format of data, all Nexla capabilities that the analysts used for their other pipelines were available on these pipelines too. So they could transform the data if they needed to before pushing to the warehouse. Additionally, with Nexla’s powerful data lineage mechanism, each record could be traced back all the way to corresponding originating email.

With Nexla’s automatic schema detection and schema evolution capabilities, the customer had a choice of using one unique email address across all partners or using different email ids per partner. And while the initial use case was specifically for multi-tab email attachments, Nexla’s solution allowed them to scale to email attachments of any file type supported by Nexla connectors – XLS, CSV, TSV, JSON, XML, Avro, Parquet, and log format.

The Results:

    • Automation from Email to Redshift created and managed by analysts without having to write a single line of code
    • Automated pipelines that extract data from each attachment, transform to desired output structure, and write out to Redshift
    • Full monitoring and traceability of records all the way to the originating email
    • Alerts and notifications on delayed emails, email format changes, and any issues on writing to the database