Data as a product, a core principle of the data mesh model, realizes its full potential in a generative data product platform. Learn what, why, and how.
The Proliferation of Data
What is a Data-Driven Enterprise?
What is a Data Product?
The 7 Top Benefits of Data Products
Data as a Product and the Data Delivery Lifecycle
A New Role: Data Product Manager
Best Practices for Data as a Product
The Business Entity – the Logic Behind Data as a Product
A Generative Platform Based on Business Entities
Data Product Platform: Data as a Product Inside
As digitization grows, so does the amount of data that’s available to an enterprise. The sheer volume of digital products, services, and business models, combined with greater connectivity to devices, has led data to proliferate exponentially. With 90% of the world’s data created in the past 2 years, enterprises are becoming more and more data-driven.
According to McKinsey, data-driven companies are
23 times more likely to acquire customers, and
19 times more likely to be profitable.
A data-driven enterprise maximizes the value of its data by treating its data as a product, and differentiating data based on its overall quality (e.g., completeness, availability, accessibility, and general fitness for use). It treats data as a product in order to drive business outcomes, for example:
A telco predicting likelihood to churn in real time, during a customer interaction
A media company serving personalized content to its subscribers
A bank promoting a new financial product to a targeted client segment
Data products are a foundational concept of the data mesh.
A data product is created with a specific purpose in mind, to make a trusted dataset accessible to authorized data consumers. It encapsulates everything a data consumer needs to generate value from the data. Common examples include:
Delivering a customer 360 dataset to a CRM application, including transactions, interactions, and master data.
Tokenizing sensitive customer information for use by operational and analytical systems
Pipelining retail inventory data from chain stores into a central data lake for AI/ML analysis
Preparing a masked test dataset and integrating it with a CI/CD pipeline, in support of the agile delivery of a wealth management system
Data products often correspond to business entities, such as customers, suppliers, devices, locations, or warehouses. Since a business entity's data is often scattered across many different source systems, a data product requires data integration tools for the unification, and ongoing synchronization of the its data with the underlying source systems.
The data product is comprised of its definition (metadata) and the resulting dataset (once instantiated), as described below:
Static metadata, encompassing all the relevant tables and fields that capture the data product's data
Active metadata, including data product usage and performance
Synchronization rules, defining how and when data is synced with the source systems
Algorithms, that transform, process, enrich, and mask the raw data
Data connectors, that ingest the source data into the data product, and deliver the dataset to data consumers; for example: JDBC, web services, Kafka, CDC, messaging, virtualization
Data governance policies, ensuring that data governance (quality and privacy compliance) is enforced according to internal and external regulations
Access controls, including credential validation and authentication
Managed as a unit, making it easy to process and access
Unified, cleansed, masked, and enriched
Persisted, virtualized, or cached
Auditable, ensuring data changes are logged in an audit log
The data product's definition and data are managed separately, with a data product having a single definition, and multiple instances of its data.
By taking the "data as a product" approach, organizations can enjoy the following benefits:
Ensure their data initiatives are business-driven and outcome-focused
Democratize data access to authorized data consumers across the organization
Enable agile delivery of incremental value through data
Provide a common language between business and IT
Achieve efficiencies through reuse of data products across use cases
Elevate the organization's trust in data
Future-proof data architectures (data mesh architecture / data fabric architecture / data hub architecture)
To take a "Data as a Product" approach, data teams must adopt a cross-functional product lifecycle approach to data. The data product delivery lifecycle should follow agile principles, by being short and iterative – to deliver quick, incremental value to authorized data consumers.
Much like software product development, where the software product manager is responsible for gathering user needs, prioritizing them, and working with software development and QA to ensure the right product is delivered at the right time, we believe that there is a place for a similar role in the data team. The data product manager will be responsible to collect data needs from data consumers (data scientists, data analysts, application owners), prioritize them, and work closely with data engineering to deliver the data product on time and on budget.
The data product must deliver business value, and realize ROI, such as more informed decision making, quicker application development, and more. For this to happen effectively, the data delivery must have a definitive timeline – a kind of service level agreement between IT and business.
In the data-as-a-product approach, data engineers, data testers, and data product
managers collaborate to deliver the right data, to the right users, at the right time.
The most obvious way to engineer a data product is to model it around the business entity that it supports, such as a customer, employee, credit card, product, or anything else that is important to the business.
Each business entity (customer John Smith) should be complete in all its attributes, enriched via analytics (propensity to churn), and easily accessible to any data consumer (person or application) that has access rights to that entity.
Usage of the business entity should be measurable. How is the data accessed, and how long does it take to get to it (response time)? How often is it accessed, and by whom? Who tried to access it, but didn’t have the right credentials? Which insights did it drive? The list goes on and on.
The overall quality of the data product must be assured, in terms of completeness, integrity, and freshness, in the sense that it’s always up-to-date.
A generative Data Product Platform, which leverages AI to manage, prepare, and deliver data in the form of business entities, is the ideal platform for delivering data products to data consumers, because it automatically defines and controls the entire data product lifecycle.
A data product platform defines an intermediary data schema aggregating all the attributes of a business entity (such as a customer, product, location or order) across all systems, in order to prepare and deliver the data as an integrated data product.
Such a platform is key to supporting the data as a product methodology. It essentially integrates data, from all sources, by business entities – cleansing, validating, enriching, transforming it, in flight, and employing data masking tools, when required. It may be deployed as a data mesh, data fabric, or customer data platform.
Data Product Platform, with its patented approach to organizing data by business entities, transforms fragmented enterprise data into data products – enabling companies to proactively adopt the "data as a product" mindset necessary to sustain data-driven leadership.