Every few years, in the data ecosystem we get excited about a new buzz word that will finally come to rescue us from our data challenges.. We start with much fanfare claiming a new terminology. Customers feel like finally there is light at the end of the tunnel. Analysts prophesize, consultants implement, startups pop up.
Three to five years later customer struggles go up rather than come down because data is a beast. It keeps getting complicated. As an industry we realize that in the excitement of new terminology we didn’t go through the necessary technical diligence to create a rigorous, future-proof solution. Then we come up with a new term, apply some of the lessons, swing the pendulum from one extreme to another, and the cycle begins all over again. Consultants make money, Analysts move on, customers go out of business after spending millions or continue to limp.
Look Before You Leap
Data Mesh is a promising concept that has been gaining a lot of attention. It tries to address some real problems plaguing enterprises today, but enterprises need to think completely to not get burnt by the shiny toy syndrome, again.
As a practitioner, having built complex, petabyte scale data systems I always recommend a careful look at the finer details of any new trend. Here is a take on Data mesh from that lens.
What Problems is Data Mesh Solving?
There are three main problems that Data Mesh concept is trying to address:
- Data is not easily accessible by data consumers.
- It takes too long and too much back and forth between data consumers and the data science and engineering teams.
- Data explosion has overwhelmed central data teams tasked as purveyor of data goods.
The idea of Data Mesh (in data management circles) centers around the fact that business domains control their data and the associated data science projects. Now what is a business domain? Think about business domain as a small department that deals with a very specific, small set of use cases.
The justification in support of the Data Mesh approach is that business people in a particular domain have the best understanding of their data. When they are working with data science or engineering teams outside the domain, it requires too much back and forth to get the right data or insights.
The Data Mesh Approach
The Data Mesh has two major concepts:
- Data Product – Ready to use, governed data products for the user
- Domain Ownership of Data Systems – Reduces dependency on central data teams (data science and engineering)
In the Data Mesh approach, a single domain becomes a “mini-enterprise” and gets the ability to control and self-serve all aspects of their data and data science projects. With ownership, comes responsibility. Domains then become responsible for treating data that they work and manage as a Data Product. They will be responsible for maintaining the quality, accuracy, freshness etc. as well as letting other domains consume the data products.
Data Products is a Great Approach, But Needs a Modern Architecture
The distributed Data Mesh solution builds on Data Product as a core entity. It provides a clean and powerful approach to solving the challenges that Data Mesh has set out to. Data Product is not a new concept, but it does need a new architectural approach for the modern times and to also be future proof.
The Data Mesh proposals today suffer from three weaknesses:
|Weaknesses in Current Data Mesh||A future-ready Data Mesh Approach|
|Need for Specialists: Domain specific ETL, Data Lake, and data tools will demand that departmental teams acquire expertise in complex data systems such as Kafka, Spark, Data Lake etc.||Logical Data Products: Data Products need to be logical entities rather than physical data and ETL jobs. This is a proven approach similar to how we use containers to create logical servers. Application developers get containers on demand but still in-house or cloud IT teams maintain the required control.|
|Pendulum Swing: Countering dependence on a central team with an approach anchored on independence can be a short-lived success.||Collaboration: Allowing data experts to collaborate with data system experts.|
|Copies of Data. More copies of data creating a governance nightmare. This is further compounded for companies with multi-cloud and hybrid-cloud infrastructure.||Zero Copy: A set of logical data layers manifest actual data at the point of use creating no intermediate copies.|
Extending the Logical Data Approach
Nexla has taken a Data Product approach to modern data management.We call these data products Nexsets. This Data Product contains schema, samples, validation, audit logs, sharing controls, collaboration features, but no real data.
Not only that, these Nexsets are auto-generated by leveraging the principles of a data fabric i.e. intelligence gathered from both actual and inferred metadata. The result of intelligently auto-generating these logical data products is a 100% code-free, collaborative way of achieving a better data mesh.t