GenAI Data: Is Your Data Ready for Generative AI?

Data readiness is the ability to prove the fitness of data for generative AI use cases. Jean-Luc Chatelain told us how it affects enterprise GenAI adoption.

Data readiness influences generative AI success

Generative AI capabilities have become a must for digital transformation. While the percentage of companies embarking on generative AI projects is rising, one key challenge continues to stand in their way: data readiness.

As the not-so-old adage goes, bad data leads to bad AI. Poor data quality and untapped value trapped in data silos continue to pose production and scalability challenges for generative AI projects.

In a recent webinar, we spoke to Jean-Luc Chatelain, founder and managing director at Verax Capital Advisors and former CTO of AI at Accenture, about data challenges that stifle generative AI adoption, and how emerging technologies such as Retrieval Augmented Generation (RAG) can help.

Generative AI projects start with data readiness

Simon McVeigh, who leads K2view’s sales engineering team and hosted the webinar, cited a Gartner study that found only 20% of AI projects, including generative AI, are being deployed at scale. Another 30% are abandoned before reaching production. In both cases, lack of data readiness is one of the top issues derailing generative AI projects.

Why are enterprises still facing data readiness issues when they’ve been prioritizing and investing in data organization for years? Jean-Luc outlined 3 main reasons:

Poorly curated data

At least 50% of the value of enterprise data is still trapped in both structured and unstructured data due to poor data management. Generative AI project models can’t retrieve data that isn’t curated or maintained correctly. What’s left is incomplete data, often leading to an irrelevant answer or an AI hallucination.
Rising volumes of transactional data

Transactional data – such as service requests, purchases, payments, and invoices – are the life of the enterprise. The volume of transactional data is constantly increasing, which makes data curation and cleansing an ever-challenging feat.
Manual data preparation

Data scientists spend 60-80% of their time preparing data manually and checking its quality. “We’re using very expensive people to serve as data janitors,” said Jean-Luc. This is one of the main reasons most enterprises cannot go beyond POCs and scale AI in production. The cost of having a human in the loop is far too high.

Data readiness = data organization + data quality

Organizing your data and assuring its quality are 2 important aspects of data readiness. Both have an impact on your RAG architecture, which augments your Large Language Model (LLM) with trusted business entity data. The ability to inject your LLM with real-time customer data, for example, from your company's own sources, results in better AI personalization and more meaningful interactions.

LLMs typically use a pre-trained model based on publicly available information from the Internet, explained Jean-Luc. But that doesn’t give you the whole picture. None of these models knows the specifics of your business. By accessing data from your own private data stores, your LLM now has the means to produce more relevant answers.

Although other methods, like fine-tuning, may also contribute to data readiness, in a retrieval augmented generation vs fine-tuning comparison, RAG proves to be more reliable.

Enhancing data readiness with RAG

Retrieval-augmented generation is a generative AI framework that enhances your LLM by infusing fresh, reliable data from authoritative internal knowledge bases (unstructured data) and enterprise systems (structured data) to generate more informed, context-specific, and dependable responses.

RAG diagram for data readiness RAG turns generic LLMs into business-specific models.

The RAG model retrieves, selects, and prioritizes the most pertinent information from the appropriate sources based on the user's query. It then transforms this data into an enriched, contextual prompt to provide an accurate and coherent response to the user.

In short, RAG turns your generic LLM into one that knows your business and your customers.

RAG in action

Simon gave a simple example of how RAG achieves ROI for generative AI projects and improves the customer experience. Imagine a customer of an Internet Service Provider (ISP) whose Internet is down interacting with the ISP’s chatbot. He tells the bot that his internet isn’t working. The bot, trained on generic data, provides a comprehensive response about how the customer can troubleshoot the issue, such as by restarting his router.

This response isn’t technically wrong. It’s one possible solution, after all. The problem is that it’s way too generic in that it lacks context into this specific customer’s situation. Perhaps the customer’s whole region is experiencing an Internet outage. Or maybe he’s late paying his bill.

An advanced RAG chatbot would enable the LLM to combine information specific to the enterprise and to the customer to provide a more accurate, up-to-date, and contextually relevant response.

Is your data ready for RAG?

Here’s Jean-Luc’s take on the top 5 data issues to address RAG conversational AI:

Dealing with unstructured and structured data

In the context of RAG, the focus is usually on unstructured data. However, unstructured data alone cannot provide a complete picture of the enterprise, and it doesn’t account for individual customer situations. According to Jean-Luc, unstructured data provides only half of the answer. Providing the most relevant and valuable responses requires both unstructured and structured (transactional) data. However, ensuring proper data preparation and structured data quality is difficult.
Conversing in real-time

People expect their conversations with chatbots to feel as natural as talking to a person. Pausing mid-sentence leads to a frustrating customer experience. Providing accurate responses in a dynamic environment requires the ability to access source systems in the RAG pipeline and process transactional data in real time. These constraints often mean companies must choose between speed and accuracy.
Tearing down silos of information

Information silos created by systems of record like ERP, HRM, CRM, and others lead to data fragmentation, which makes enterprise-wide data retrieval difficult. For RAG to work, information stored in enterprise systems must be accessible, of high quality, and include accurate metadata.
Ensuring data security and privacy

Personally Identifiable Information (PII) and other sensitive data are vulnerable to data leaks or malicious attacks. It’s your responsibility to ensure your data is protected during the entire process, by being properly masked, and having the right access controls in place.
Balancing cost and scalability

One reason so many generative AI projects never make it to deployment is because they become too expensive to implement at scale. Often, it ends up being cheaper to continue doing certain processes manually than deploying generative AI. For companies with millions of customers, the cost of supporting a high volume of inbound requests and inferences eliminates any prospect of ROI.

Maximizing generative AI data readiness

The key to successfully deploying GenAI projects and gaining the most possible value from RAG is by maximizing generative AI data readiness. Clean, high-quality, and unified data with the correct metadata plays a major role in avoiding generic responses and providing a satisfactory customer experience. In addition, the data must be secured and available for real-time conversations at scale.

According to Jean-Luc, there are 3 key focus areas for improving data readiness:

Significantly improve data quality automation

Ensure data scientists aren’t spending the bulk of their time manually cleaning and preparing data for LLM use. Develop a system for keeping up with data preparation requirements as the volume of transactional data rises.
Focus on trustworthiness

Data security and privacy must be top-of-mind when planning generative AI projects. As RAG GenAI expands, so do the types and severity of security threats. All sensitive data, from production to usage, must be protected.
Break down data silos

Ensure data stored in enterprise systems is accessible and unified, with the correct metadata, so it can be accessed and used to provide context-based responses in real time.

Learn more about K2view GenAI Data Fusion,
the RAG tool that makes your data GenAI-ready.

Overview

Capabilities

Architecture

Initiative

Industry

Company

Reach Out

News Updates

Education & Training

Resources

Demo

Table of Contents

Table of Contents

Data Readiness Can Make or Break Your GenAI Projects

Iris Zarecki

Product Marketing Director

Data readiness influences generative AI success

Generative AI projects start with data readiness

Data readiness = data organization + data quality

Enhancing data readiness with RAG

RAG in action

Is your data ready for RAG?

Maximizing generative AI data readiness

Achieve better business outcomeswith the K2view Data Product Platform

Ground LLMs
with Enterprise Data

Get the latest market research on GenAI and RAG

IDC Analyst Report

Closing the GenAI Data Gap

Gartner report

Early Lessons in Building LLM-Based Generative AI Solutions

BLOOR RESEARCH

RAGs to Riches? The Reality of AI-Generated SQL

Overview

Capabilities

Architecture

Initiative

Industry

Company

Reach Out

News Updates

Education & Training

Resources

Demo

Table of Contents

Table of Contents

Data Readiness Can Make or Break Your GenAI Projects

Iris Zarecki

Product Marketing Director

Data​ readiness influences generative AI success​​

Generative AI projects start with data readiness

​​Data readiness = data organization + data quality​​​​​

Enhancing data readiness with RAG

RAG in action

​​Is your data ready for RAG?​​​​​

Maximizing generative AI data readiness

Achieve better business outcomeswith the K2view Data Product Platform

Ground LLMswith Enterprise Data

Get the latest market research on GenAI and RAG

IDC Analyst Report

Closing the GenAI Data Gap

Gartner report

Early Lessons in Building LLM-Based Generative AI Solutions

BLOOR RESEARCH

RAGs to Riches? The Reality of AI-Generated SQL

Related articles for you

Generative AI Data Augmentation: An IDC Research...

RAG Structured Data: Leveraging Enterprise Data...

Data Quality for AI: Through the Looking Glass

Data readiness influences generative AI success

Data readiness = data organization + data quality

Is your data ready for RAG?

Ground LLMs
with Enterprise Data