A practical guide for Data Mesh implementation
In this article, I will share an overview of Data Mesh, a few key principles, and a couple of implementation approaches.
Why Data Mesh?
Key challenges in the current traditional Data and Analytics approach:
The current traditional data and analytics approach has a few challenges as mentioned below.
- Most of the Data Warehouse implantations are centralised and monolithic
- Lack of compossibility in the traditional ELT/ETL (E-Extract, T-Transform, L-Load) approach
- Hype-specialised silos within the business and information technology team.
The Data Mesh concept originated to confront a few failure symptoms:
- Fail to scale consumers of the data
- Fail to materialize data-driven values
A brief history, how information landscape had evolved over 40 yrs
Application and Data Paradigm
To describe the evolution of the information and data processing landscape we need to go back a couple of decades when the Mainframe was introduced as a large computer system by IBM in 1952. During the 1960s and 1970s, IBM mainframe dominated the large computer market.
Later during 1989–90 English scientist, Tim Berners-Lee co-invented the World Wide Web along with Robert Cailliau. The Web began to enter everyday use in 1993–1994 when websites for general use started to become available.
These Web applications are also known as OLTP applications. OLTP, or online transactional processing, enables the real-time execution of large numbers of database transactions by large numbers of people, typically over the internet.
These OLTP applications are not developed to manage high-volume data processing for analytics reporting. Hence OLAP (for online analytical processing) was required to manage this kind of workload.
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a unified, centralized data store, like a data warehouse.
Programming Paradigm
Form 1990 the object-oriented-programming (OOP) is also becoming popular. C++ and Java-based applications were started to develop based OOP programming techniques
OOP languages need to have four features. First, is the ability to create objects. Second, is the ability to structure code through inheritance. Third, the encapsulation (ability to hide some data). Forth, is polymorphism (the ability to change the way a method behaves).
In OOP, objects are the merger of data and behaviour. Objects are the building blocks.
A different technique was required to separate data from the behaviour. SOA (Service Oriented Architecture) was introduced in 1997. SOA uses services to build systems, which tend to separate data from behaviour.
Service-oriented architecture (SOA) has been with us for a long time. The term first appeared in 1998, and since then it’s grown in popularity. It’s also branched into several variants, including microservice architecture. While microservices dominate the landscape, reports of SOA’s death have been greatly exaggerated.
Dr. Peter Rodgers used the term “Micro-Web-Services” in 2005 during a presentation on cloud computing.
Big-Data, IoT and Cloud Computing
Since 2006 the CSP (Cloud Service Provider) started to offer different cloud computing services which are helping organisations to modernise the application landscape.
Organizations started to develop new kinds of solutions and smart devices based on cloud computing since then.
in 1999 Kevin Ashton coined the term IoT (Internet of things). Google Trends shows that interest in IoT really exploded in 2014, before reaching its peak in late 2016.
In October of 2010, James Dixon, founder, and former CTO of Pentaho, came up with the term “Data Lake.”
The Cloud Service Providers started to offer different services to store and process high-volume and high-velocity data and helped the organisation with different open-source Big-Data technologies to develop complex data products by leveraging the Data Lake approach.
Another type of Data Modelling technique (Data Vault 2.0) became popular in 2015. Data Vault modelling includes hubs, links, and satellites. This approach is still used while developing agile data products.
Data Mesh
Over the last two decades, we have seen organisations are dwelling with Data Warehouse or Data Lake or Hybrid (mix of both) approach. We have also seen the application landscape has changed from monolithic to Service-oriented to Microservice architecture. The organisations have changed from project to product mindset.
A new concept ‘Data Mesh’ or ‘Data Fabric’ started to emerge in 2019 to resolve the key challenges from the traditional approach as described at the beginning of the article.
The key challenges in the traditional approach are highlighted in the table below:
Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments — within or across organizations. There are four simple principles that can capture what underpins the logical architecture and operating model of Data Mesh.
Data Mesh Principles
4 Principles of Data Mesh Architecture
- Domain ownership: Responsibility for modelling and providing important data is distributed to the people closest to it, providing access to the exact data they need, when they need it.
- Data as a product: Data is treated as a product like any other, complete with a data product owner, consumer consultations, release cycles, and quality and service-level agreements.
- Self-service: Empower consumers to independently search, discover, and consume data products. Data product owners are provided standardized tools for populating and publishing their data products.
- Federated governance: This is embodied by a cross-organization team that provides global standards for the formats, modes, and requirements of publishing and using data products. This team must maintain the delicate balance between centralized standards for compatibility and decentralized autonomy for true domain ownership.
Data Mesh Implementation Approach
Data Mesh implementation approach has lots of similarities with the Microservice architecture principles (e.g., domain-driven design, product-first mindset, composable architecture, decentralised governance). so, it’s worth revisiting the Microservice architecture and principles while defining the Data Mesh implementation approach.
Data Mesh embodies domain-driven design, which is well established in the microservice architecture.
There is a couple of emerging data mesh design techniques, which can be followed while defining the target architecture for the Data Mesh input data ports.
- Developing separate analytics microservice by using the same data source used/developed by the application microservices
- Repurposing the message hub and creating a new subscription for the analytics data product
- Developing a completely decoupled message-hub
- Developing a separate data virtualisation layer based on the same data source used by the application microservices
Data Mesh Implementation Approach — Pattern1
In this approach, Analytics Data Products are developed based on separate Analytics service APIs. The Analytics service APIs are developed based on the read-replica data sources.
Data Mesh Implementation Approach — Pattern2
Analytics data products are developed based on the event-based model. In this approach, the transaction events are sourced from the same message broker in a different subscription for analytics product development.
Data Mesh Implementation Approach — Pattern3
Analytics Data Products are developed based on the event-based architecture, completely decoupled from the microservice application.
Data Mesh Implementation Approach — Pattern4
Analytics Data Products are developed on top of the data virtualisation layer. The data virtualisation layer is developed based on transactional data sources.
Data Mesh Implementation Approach — Analytics Services
In the previous section, we have seen that the input data ports can be developed based on pub/sub, API, or data virtualisation techniques. The Data Products are developed within a business domain based on the input data sources. The output data ports are served via reporting or data/API services. In the middle, the data product is developed based on descriptive (rule-based), predictive, or recommendation models to serve the analytics requirement of the individual domain.
This process will require emphasis on decentralised data governance to maintain consistency across different products in different domains.
Using this approach, the domain concept can be extended from the digital to the data product, by adopting similar principles of microservice architecture.
Reference:
The Data Mesh principles have been taken from https://martinfowler.com/articles/data-mesh-principles.html written by Zhamak Dehghani