The term “DataOps” or “Data Operations” is having a moment. It could well be that 2017 turns into the year of DataOps, with startups, incumbents, and analysts all using this term. But what does it mean and why is it bubbling to the surface now? In this post we present Nexla’s view of what DataOps is, and why it’s so important.
The Nexla team cut its big data teeth in AdTech, where “AdOps” is a critical function responsible for the delivery of online ad campaigns. “Operations” more broadly is a business function that seeks to maximize efficiency between inputs and outputs, born during the industrial revolution and developed by Ford and assembly line production. With this background, it was natural to think of what we are building as a tool for “Data Operations.” It turns out we weren’t the only ones thinking this way.
Tamr CEO Andy Palmer wrote about this topic nearly two years ago, from a DevOps perspective. Blue Hill Research’s Toph Whitmore describes DataOps as “an enterprise collaboration framework that aligns data-management objectives with data-consumption ideals to maximize data-derived value.” Our take is a little different, and explained below.
In hundreds of conversations with customers, investors, and other data professionals, we’ve found that everyone believes they have heard the term before, but isn’t quite sure what it means, exactly. When asked to describe DataOps, most people intuitively understood it had something to do with moving data to the right place in the right format. To move the conversation forward, we need a clear definition we can all use. At Nexla, we believe:
DataOps is the function within an organization that controls the data journey from source to value.
We believe the goal of DataOps is to take data from its source (or creation, as Whitmore notes) and deliver it to the person, system, or application that can turn it into business value. That business value could be an input to a model, analytical insight, or even a revenue-generating data product. DataOps, just like operations on a factory floor, controls the process of production that efficiently turns inputs into outputs.
The Data Journey: From Source to Value
We illustrate the data journey with the above graphic. Starting at the left, we have data sources (or data creation– we can include IoT sensors or connected cars here). These sources can be internal or external. This number will only increase as machine learning models require ever-more data.
Data is then connected to a data pipe. At Nexla we believe this can be a “smart pipe” that helps prep, clean, and transform data before it is delivered to the destination. This smart pipe can span across organizations, allowing both the sending and receiving company to transform data to suit their needs. The ultimate destination could be another database, an API, or even a simple CSV. At the end of the data journey, data should be ready for business intelligence, advanced analytics, or machine learning. This is the ultimate goal of the data: to provide business value.
Throughout the data journey, DataOps must monitor the data, secure and manage access, all while providing ease of use and discoverability. Let’s take each of these requirements in turn.
Monitor: Monitoring the entire data journey is of critical importance. It means keeping track of the data streams that are operational, being alerted immediately when a schema changes, or an abnormality is detected. As the number of integrated data streams increases, manual monitoring becomes untenable.
Secure and Manage Access: Especially when ingesting or sending data from or to third parties, security is important. Attribute-level access management is required, and data governance must be controlled. With many data sources the need for a central “command and control” becomes clear.
Ease of Use and Discoverability: Since the ultimate goal of DataOps is to drive business value, it is important for business line users to have access to data. Beyond proper access, they need to easily be able to retrieve the data they need to analyze it with their preferred tools. This can mean exporting to another database to query with SQL or even Excel. It can also mean feeding the right data to an analytics platform like Tableau or Looker. Finally, discoverability and data mapping is critical as data becomes democratized. Understanding what data is available and its schema is the first step to business user analysis.
The Zero Sum Game: Why Now is the Time for DataOps
As the volume, velocity, and variety of data increases, new tools and processes are needed to extract insight. The volume of data is expected to grow to 180 zettabytes in the next 10 years. Today’s tools, processes, and organizational structures aren’t equipped to handle this massive increase in data inputs, and the increasing value expected from its output.
We need tools that help us automate the data journey so we can accelerate the time from source to value. The less time we spend taking the data from its source to its destination, aka DataOps, the more time we have for value creation. It’s a zero sum game.