Data Fabric vs Data Mesh: Demystifying the Differences

Written by Yuval Perlov | November 17, 2021

Compare the architectures of data fabric vs data mesh, and learn how both approaches can be fused to create a versatile data management stack. Read on.

Name that Data Management Architecture

When comparing data fabric vs data mesh, it's important to start with the understanding that both are data management architectures. Simply put, data fabric is a multi-tech framework capable of many outputs – one of which is data products. Data mesh is an architecture whose sole purpose is to deliver business-driven data products.

What is a Data Product?

A data product is produced to be consumed with a specific purpose in mind. It may assume different forms, based on the specific use case, or business domain, to be addressed. It usually corresponds to a specific business entity – such as a customer, vendor, order, credit card, or product – that data consumers would like to access for operational and analytical workloads.

The data for the product is typically collected from many different siloed source systems, often in different formats, structures, technologies, and terminologies.

A data product essentially encapsulates everything that a data consumer needs to generate value from the business entity's data.

This includes the data product's metadata:

Schema, of all the data tables and relationships
Logic, for processing the ingested, raw data
Access methods, such as SQL, JDBC, web services, streaming, CDC,...
Synchronization rules, defining how and when data is synced with the source systems
Orchestrated data flows, that prepare the data for delivery
Lineage, to the source systems
Access controls, including credential checking and authentication

And the data product's data:

Uniquely instantiated, based on the data product's metadata
Unified, cleansed, and masked via data masking tools
Enriched with insights from real-time and offline analytics
Persisted, virtualized, or cached
Made auditable, with an audit log of all data changes, and active metadata (performance, usage of the data product)

The data product's definition and data are managed separately, with a product having a single definition, and multiple instances of its data (one for each individual data product).

The data product is created by applying a product lifecycle methodology to data. The data product delivery lifecycle adheres to agile principles of being short and iterative, to deliver quick, incremental value to data consumers.

A data product approach entails:

Definition and design
Data product requirements are defined in the context of business objectives, data privacy and governance constraints, and existing data asset inventories. Data product design depends on how the data will be structured, and how it will be componentized as a product, for consumption via services.
Engineering
Data products are engineered by identifying, integrating, and collating the data from its sources, and then employing data masking, as needed. APIs for data services are created to provide consuming applications with the authority to access the data product, and a data pipeline is built to prepare the datasets, and deliver them to the data consumers, or apps, that requested them.
Quality assurance
Test data management tools are used to validate the data for its completeness, compliance, and freshness – and to make sure it can be securely consumed by applications at massive scale.
Support and maintenance
Data usage, pipeline performance, and reliability are continually monitored, by local authorities and data engineers, so issues can be addressed as they arise.
Management
Just as a software product manager is responsible for defining user needs, prioritizing them, and then working with development and QA teams to ensure delivery, the data product approach calls for a similar role. The data product manager is responsible for delivering business value and ROI, where measurable objectives – such as response times for operational insights, or the pace of application development – have definitive goals, or timelines, based on SLAs reached between business and IT.

Data Fabric vs Data Mesh – a Study in Contrasts

Where data fabric combines multiple technologies to design a centralized, metadata-driven, augmented (AI-driven) data platform, data mesh is a decentralized architecture and operating model that guides the design within a framework that is independent of technology.

Design foundations
Data fabric continually uses active metadata (data usage and performance) and centralized data engineering to optimize the data management infrastructure based on the combined experience of data consumers across the enterprise. Data mesh makes use of current business expertise, distributed in business domains, to design (i.e. predefine) and create data products.
Cross-dependence
Both data fabric and data mesh rely on distributed data governance tools, and authority, and so, their design principles are similar. However, a data fabric architecture can be built without adopting a data mesh architecture. And data mesh must rely on the discovery and data analysis principles of data fabric to create data products.
Data treatment
Data fabric identifies and tracks repeated use cases, and treats the reuse of data assets as potential contributors that can augment, refine and resolve data authority. Data mesh relies on original data sources and systems in which data assets are designed and captured to create data products with a business-centric context.
Human intervention
Data fabric depends on cross-platform data management, centralized data engineering teams, and augmented (AI-based) orchestration to minimize the need for IT involvement in design, deployment and maintenance. In contrast, the data mesh is based on manual design and orchestration of existing systems, with ongoing maintenance federated to business domains.

Data Fabric vs Data Mesh – Head to Head

How are they different?
Data fabric relies on the efficiency and capabilities of current (centralized) data management tools. Data mesh shifts the architecture design towards distributed data services and a federated operating model.

How are they the same?
Both data fabric and data mesh represent the culmination of more than 50 years of data management technology experience. Both can adapt to, and make use of, the practices of the other. The total cost of both frameworks are similar, relative to design and deployment. However, implementing advanced AI capabilities in the data fabric may prove to drive cost-efficiencies in ongoing maintenance.

Here’s a summary of the differences and similarities of data fabric vs data mesh, adapted from analyst firm Gartner's report "Are Data Fabric and Data Mesh the Same or Different?":

	Data Fabric	Data Mesh
Data products	Data products are based on production usage patterns Metadata is continually discovered and analyzed Centralized delivery of existing assets and new, auto-discovered assets	Data products are designed by business domains Localized business metadata, static in nature Federated delivery of existing assets
Centralization	All data assets are contributors or beneficiaries Local authority is challenged, then aligned, and confirmed	Challenges, and at times conflicts with (centralized) data lakes and DWHs Local data authority is unchallenged
Deployment	Leverages and extends current data infrastructure Local data authorities tested to avoid conflicts	Leverages and extends current data infrastructure, with new implementation in business domains Complete reliance on local data authorities
Data quality and stewardship	Analysis of data sources, value patterns, and use case integrity performed by AI/ML engines	Analysis of data changes performed by data engineers
Common architecture principles	Business domain basis Data product output Use case driven (authority) Ongoing data discovery Graph of data behavior

K2view Data Product Platform – the Best of Both Worlds

Data Product Platform incorporates the data product concepts of data mesh, and the decentralization of data engineering, governance and control to business domains, while affording centralized control over certain aspects of data quality, compliance, security, and residency. Here’s how:

Data products inside
Data Product Platform is the runtime platform for executing data products. The data products are defined and managed by data SMEs in the business domains, and then made available to all authorized data consumers across the enterprise.
A micro-database for every data product
The data for each data product instance is packaged in its own secure Micro-Database™, continuously in sync with all source systems, and instantly accessible to data consumers. For example, a B2C enterprise with 100 million consumers, would have its customer data organized in 100 million separate Micro-Databases, managed by a customer data product.
Trusted data, all the time
A data product platform delivers a trusted, real-time view of any business entity - for example, customer, supplier, loan, order, or employee. It deploys in weeks, scales linearly, and adapts to change on the fly. It supports data fabric, data mesh, and data hub architecture – on premises, in the cloud, or across hybrid environments.
Endless use cases at a fraction of the time and cost
The platform drives many use cases, including Customer 360, data migration, test data management, legacy application modernization, and more – to deliver business outcomes at less than half the time and cost of any alternative.

View full post