The Next Generation of Data Management: Democratizing DataOps with Self Service

Kevin Petrie
Kevin Petrie

This is a guest post by Kevin Petrie, VP of Research at Eckerson Group.

 Kevin’s passion is to decipher what technology means to business leaders and practitioners.  He has invested 25 years in technology, as an industry analyst, writer, instructor, product marketer, and services leader.  A frequent public speaker and accomplished writer, Kevin has a decade of experience in data management and analytics.

Much like a barbell, DataOps joins two large things: data supply and data demand. Growth on both ends puts a lot of strain on the middle.

DataOps arose as a discipline in recent years to improve the efficiency, effectiveness and scalability of data delivery. It applies the principles of agile development, DevOps, lean manufacturing and total quality management to the people, process and technology that manage data pipelines.

Increasing Data Supply and Demand strains DataOps

But rising data supply and demand exposes cracks in the DataOps approach. Many projects fall short of expectations due to the technical complexity of data pipeline tools and a lack of integration with business processes, as my colleague Dave Wells articulated in his recent blog. BI analysts and data scientists often wait on isolated, overburdened data engineers to handle complex scripting for data ingestion, transformation and delivery. And there are never enough data engineers to prevent backlog.This single threaded process breaks service levels and undermines data quality. Many business-oriented users remain unserved because they lack intuitive tools to help themselves and as a result, the value of data for driving business decisions is diminished.

Enter Self Service and Collaboration

Self-service tools can restore some of the promise of DataOps by replacing scripts with automation and fostering cross-functional collaboration. They accelerate analytics projects, reduce risk and streamline effort, thereby enabling data-driven decisions across the enterprise without requiring data engineers to step in for every request. Here is a look at how stakeholders on both ends of the barbell might just benefit from self-service.

Data consumers can start to handle essential stuff on their own. Armed with the right tools and training, BI analysts and data-savvy business managers serve themselves and avoid the risk of mis-interpreted or mis-handled requirements. They design, execute and monitor standard data integration and preparation tasks with graphical interfaces, rather than waiting on data engineers to script those tasks for them. Data scientists also use graphical data pipeline tools, for example to pre-process large data sets for machine learning, further alleviating their dependence on data engineers. By applying their domain expertise, data consumers help execute projects faster, more efficiently and with lower risk.

Data engineers can focus on strategic work. With some of the essential but time consuming stuff off their plate, data engineers finally find time to improve data pipeline efficiency, data quality and governance. For example, they assess new cloud analytics platforms, and evaluate new data platforms. They work with data stewards and governance officers to institute data profiling and quality checks. Or they help design new data masking procedures to control the usage of Personally Identifiable Information (PII). For most organizations, this is how data engineers can best increase data pipeline efficiency.

Data consumers and data engineers can collaborate. While no tool can automate and standardize everything, new DataOps tools can streamline the process by facilitating collaboration. When BI analysts or data scientists need a custom data transformation job, they can get coaching from their data engineering counterpart. They might huddle over a catalogue of prepared data sets, then collaborate with data engineers to develop and apply specialized filtering, enrichment or joins to select data sets. Data engineers have time to do this, improving the efficiency of custom work, because they spend less time on the basic stuff.

Platform Case Study

Nexla offers a converged DataOps/Self-Service platform that empowers data engineers, BI analysts and data scientists to collaboratively manage the flow of data in analytics processes. As a result, data is managed much like a product on the assembly line. This platform seeks to remove the friction that slows data access and collaboration. Data engineers can focus on higher-value tasks, and BI analysts and data scientists can access and transform the data they need to generate reports and dashboards and build models.

What does this look like? Data engineers create schemas and provision table or column-level data sets to folders for BI analysts and data scientists to browse, tag and preview via samples. BI analysts and data scientists then step in, applying their business domain expertise to transform and prepare data for analytics. They can use a GUI to perform lookups or filter via attribute ranges, then consume and share these datasets from cloud stores such as S3 buckets, using open formats such as XML and CSV, as part of integrated workflows. Business managers, meanwhile, use Nexla to define templates or rules for their team members to follow.

Data engineers also can support governance efforts to assist data stewards. They can configure access controls, run data quality checks, track lineage and audit actions, and integrate with 3rd party solutions and frameworks. Data engineers and consumers alike assist governance because they reduce complexity by reusing one another’s templates, schemas and datasets.

The Squishy Middle

This case study illustrates a paradox of data management: effective DataOps teams both specialize and share responsibilities. Data engineers focus on their specialty, which is creating efficient and effective data pipelines. BI analysts and data scientists focus on their specialties, which are business operations and statistics. But they team up with one another, case by case, to define where one domain ends and the other begins. To get back to our barbell analogy, a successful DataOps strategy must provide the flexibility to bend in order to support the heavy, growing weights of data supply and demand.

Self-service and collaboration might just make this happen.


How well do you operate?
Request a demo to learn more about collaborative, self-service DataOps with Nexla

Learn more about Collaborative, Self-service DataOps

Join us on Sep 29th at the Answers conference