How to Futureproof the Modern Enterprise Data Stack

How to Futureproof the Modern Enterprise Data Stack

The modern data stack is emerging as the go-to cloud data architecture for analytics, machine learning, and to some degree business operations. However, as a business grows in complexity and data maturity, one data stack will no longer be enough.

In short, the modern data stack will become modern data stacks.

In adopting the modern data stacks approach, it is important to make sure that you are future-proofing against the inevitable growth in data variety and complexity of your data-powered applications.

Before we talk about the “what” of modern data stacks, let’s start with the “why” and several real life case studies. In short, the appetite for data is never-ending. It’s not uncommon for large enterprises to have many types of databases, such as transactional, graph, document-based, in-memory, time-series, key-value, globally distributed and more. Additionally, data is constantly coming in from everywhere including file and object stores like FTP, S3, shared drives, Dropbox, streaming sources, events, IoT, webhooks, and APIs. That’s how enterprises run in today’s data-heavy world.

Consider a few scenarios:

  • A financial risk team is running an AI-based model to detect fraud in online purchases in real time. This is a real-time data stack in a transaction flow.
  • A business development team uses Salesforce as the single source of truth but consumes data and gets insights for quarterly planning in a Tableau dashboard that’s sitting on top of Redshift. This is a traditional analytics stack on a data warehouse.

Each of these use cases requires its own modern data stack. Hundreds of use cases mean hundreds of modern data stacks in the company.

The History of the Modern Data Stack

The modern data stack came up as a term in 2020 and rose to prominence in 2021. It started with the observation that the cloud data warehouse has become an anchor piece of the data stack. The goal in the early 2010s was that companies should capture all data points they could and leverage Hadoop-based data lakes to store the data and then query it. Technical complexity of Hadoop implementations, performance limitations, and finally the realization that most enterprises were at a multi-terabyte, not a multi-petabyte scale, paved the way for cloud data warehouses. These systems offered storage capacity large enough for most enterprises but with an unprecedented ease of use, maintainability, and nearly database-like performance.

Today’s modern data stack include the tools that power and enable an ecosystem around the cloud data warehouse.

BI, analytics, and machine learning are key data applications for today’s enterprise with the cloud data warehouse at the center. Innovations in cloud data warehouse technology catalyzed the creation of the supporting technologies that help create the stack outlined below.

Limitations of the Current Modern Data Stack

There are real limitations to the current modern data stack approach. In this blog, Tristan Handy, founder of dbt, has done a good job of outlining the modern data stack and also sharing its limitations.

  1. Batch-based: With the warehouse as a core, most modern data stack components work in batch mode.
  2. A one-way road: In modern data stacks, data converges into the warehouse because that’s where data is meant to be used. That leaves out use cases where data needs to leave the warehouse and move into other systems.
  3. Bridge not yet built to data consumers: For a non-technical data user, the modern data stack can still be very daunting. Writing SQL transforms, for example, can be easy to start but very hard to master.
  4. Governance is not mature: The modern data stack was born out of the need to overcome the slowness of legacy systems. With most of the focus on time-to-value, governance took a backseat.
  5. Collection of point tools: Today’s modern data stack can feel like a loose collection of disparate tools. What works for point use cases in small teams becomes unmanageable for enterprises.

Some of these deficiencies are so severe it makes you wonder if a data stack that isn’t real-time, lacks proper governance, and is not easy to use for data consumers still deserves to be called “modern”.

Understanding the root of limitations

The limitations of modern data stacks come primarily from three things:

  • Data warehouse core: Although modern cloud data warehouses can be powerful, when data in motion hits a warehouse it comes to a grinding halt.
  • Point use cases: While a modern data stack is a great start to bringing data within reach of so many non-technical users at the business level, that success has created a new challenge: how do you uplevel the modern data stack to the more complex use cases?
  • Connector limitations: If all data has to be in a cloud warehouse, then connectors are necessary from every possible system, not just the popular ones. Unfortunately, the modern data stack requires a separate connector for each data source which can limit data flow.

Futureproofing the Modern Data Stack

The modern data stack as we know it today is already hitting some limits. Let’s look at where they’re occurring. In order to future-proof the modern data stack, the following aspects need to be thought through:

  • Decentralized Data: How are data users going to work with data that is not in a cloud data warehouse?
  • Data Discovery & Collaboration: How are data users going to find the data they need? How can they collaborate with others on data projects?
  • Long Tail Integrations: How can data users build data flows for an ever-increasing number of sources and destinations with minimal reliance on technical data engineering teams?
  • Real-time and Streaming Data: How do data users and apps get real-time and streaming data?
  • Data Ownership: How can data users and domain teams get the speed and agility that they need while complying with enterprise-wide security and governance requirements?
  • Data Observability: As more data flows in the enterprise, how does the quality and data issues get monitored and acted upon?
  • Data Fabric: Adding a data preparation layer like Nexla to a data fabric-based architecture unifies and automates many complexities of data management. This layer sits between the data infrastructure and the consumption layer and makes it easy for organizations to accelerate their data journey.

Futureproofing the Modern Data Stack with Data Fabric and Data Mesh

A truly modern data stack keeps the best of the original data stack but also brings in elements from data fabric architecture and data mesh principles to address the gaps identified above. Here is what the modern data stack 2.0 looks like:

(Upgraded with Data Fabric and Data Mesh)

Modern Data Stacks and the Future

Being able to access, transform, load, secure, and use data from any source is an integral part of any modern data solution. By bringing in elements of data fabric and data mesh, modern data stacks cover the shortcomings of older data storage solutions while pushing the limitations data stacks are beginning to run into. Whether you’re building a data solution from scratch, looking to upgrade an outdated system, or need more versatility and function from your data stack, consider the benefits of modern data stacks and how they are growing with modern data consumption needs. The future is here and it’s data; can your data solution keep up?

Check out our webinar on Building a Data Mesh to get started on yours today!

Unify your data operations today!

Discover how Nexla’s powerful data operations can put an end to your data challenges with our free demo.