Compare the architectures of data fabric vs data mesh, and learn how both approaches can be fused to create a versatile data management stack. Read on.
When comparing data fabric vs data mesh, it's important to start with the understanding that both are data management architectures. Simply put, data fabric is a multi-tech framework capable of many outputs – one of which is data products. Data mesh is an architecture whose sole purpose is to deliver business-driven data products.
A data product is produced to be consumed with a specific purpose in mind. It may assume different forms, based on the specific use case, or business domain, to be addressed. It usually corresponds to a specific business entity – such as a customer, vendor, order, credit card, or product – that data consumers would like to access for operational and analytical workloads.
The data for the product is typically collected from many different siloed source systems, often in different formats, structures, technologies, and terminologies.
A data product essentially encapsulates everything that a data consumer needs to generate value from the business entity's data.
This includes the data product's metadata:
Schema, of all the data tables and relationships
Logic, for processing the ingested, raw data
Access methods, such as SQL, JDBC, web services, streaming, CDC,...
Synchronization rules, defining how and when data is synced with the source systems
Orchestrated data flows, that prepare the data for delivery
Lineage, to the source systems
Access controls, including credential checking and authentication
And the data product's data:
Uniquely instantiated, based on the data product's metadata
Unified, cleansed, and masked via data masking tools
Enriched with insights from real-time and offline analytics
Persisted, virtualized, or cached
Made auditable, with an audit log of all data changes, and active metadata (performance, usage of the data product)
The data product's definition and data are managed separately, with a product having a single definition, and multiple instances of its data (one for each individual data product).
The data product is created by applying a product lifecycle methodology to data. The data product delivery lifecycle adheres to agile principles of being short and iterative, to deliver quick, incremental value to data consumers.
A data product approach entails:
Definition and design
Data product requirements are defined in the context of business objectives, data privacy and governance constraints, and existing data asset inventories. Data product design depends on how the data will be structured, and how it will be componentized as a product, for consumption via services.
Engineering
Data products are engineered by identifying, integrating, and collating the data from its sources, and then employing data masking, as needed. APIs for data services are created to provide consuming applications with the authority to access the data product, and a data pipeline is built to prepare the datasets, and deliver them to the data consumers, or apps, that requested them.
Quality assurance
Test data management tools are used to validate the data for its completeness, compliance, and freshness – and to make sure it can be securely consumed by applications at massive scale.
Support and maintenance
Data usage, pipeline performance, and reliability are continually monitored, by local authorities and data engineers, so issues can be addressed as they arise.
Management
Just as a software product manager is responsible for defining user needs, prioritizing them, and then working with development and QA teams to ensure delivery, the data product approach calls for a similar role. The data product manager is responsible for delivering business value and ROI, where measurable objectives – such as response times for operational insights, or the pace of application development – have definitive goals, or timelines, based on SLAs reached between business and IT.
Where data fabric combines multiple technologies to design a centralized, metadata-driven, augmented (AI-driven) data platform, data mesh is a decentralized architecture and operating model that guides the design within a framework that is independent of technology.
Design foundations
Data fabric continually uses active metadata (data usage and performance) and centralized data engineering to optimize the data management infrastructure based on the combined experience of data consumers across the enterprise. Data mesh makes use of current business expertise, distributed in business domains, to design (i.e. predefine) and create data products.
Cross-dependence
Both data fabric and data mesh rely on distributed data governance tools, and authority, and so, their design principles are similar. However, a data fabric architecture can be built without adopting a data mesh architecture. And data mesh must rely on the discovery and data analysis principles of data fabric to create data products.
Data treatment
Data fabric identifies and tracks repeated use cases, and treats the reuse of data assets as potential contributors that can augment, refine and resolve data authority. Data mesh relies on original data sources and systems in which data assets are designed and captured to create data products with a business-centric context.
Human intervention
Data fabric depends on cross-platform data management, centralized data engineering teams, and augmented (AI-based) orchestration to minimize the need for IT involvement in design, deployment and maintenance. In contrast, the data mesh is based on manual design and orchestration of existing systems, with ongoing maintenance federated to business domains.
How are they different?
Data fabric relies on the efficiency and capabilities of current (centralized) data management tools. Data mesh shifts the architecture design towards distributed data services and a federated operating model.
How are they the same?
Both data fabric and data mesh represent the culmination of more than 50 years of data management technology experience. Both can adapt to, and make use of, the practices of the other. The total cost of both frameworks are similar, relative to design and deployment. However, implementing advanced AI capabilities in the data fabric may prove to drive cost-efficiencies in ongoing maintenance.
Here’s a summary of the differences and similarities of data fabric vs data mesh, adapted from analyst firm Gartner's report "Are Data Fabric and Data Mesh the Same or Different?":
Data Fabric |
Data Mesh |
|
Data products |
|
|
Centralization |
|
|
Deployment |
|
|
Data quality and stewardship |
|
|
Common architecture principles |
|
Data Product Platform incorporates the data product concepts of data mesh, and the decentralization of data engineering, governance and control to business domains, while affording centralized control over certain aspects of data quality, compliance, security, and residency. Here’s how:
Data products inside
Data Product Platform is the runtime platform for executing data products. The data products are defined and managed by data SMEs in the business domains, and then made available to all authorized data consumers across the enterprise.
A micro-database for every data product
The data for each data product instance is packaged in its own secure Micro-Database™, continuously in sync with all source systems, and instantly accessible to data consumers. For example, a B2C enterprise with 100 million consumers, would have its customer data organized in 100 million separate Micro-Databases, managed by a customer data product.
Trusted data, all the time
A data product platform delivers a trusted, real-time view of any business entity - for example, customer, supplier, loan, order, or employee. It deploys in weeks, scales linearly, and adapts to change on the fly. It supports data fabric, data mesh, and data hub architecture – on premises, in the cloud, or across hybrid environments.
Endless use cases at a fraction of the time and cost
The platform drives many use cases, including Customer 360, data migration, test data management, legacy application modernization, and more – to deliver business outcomes at less than half the time and cost of any alternative.