Table of Contents

    Table of Contents

    Data Readiness Can Make or Break Your GenAI Projects

    GenAI Data: Is Your Data Ready for Generative AI?
    8:51
    Iris Zarecki

    Iris Zarecki

    Product Marketing Director

    Data readiness is the ability to prove the fitness of data for generative AI use cases. Jean-Luc Chatelain told us how it affects enterprise GenAI adoption.  

    Data​ readiness influences generative AI success​​  

    Generative AI capabilities have become a must for digital transformation. While the percentage of companies embarking on generative AI projects is rising, one key challenge continues to stand in their way: data readiness.

    As the not-so-old adage goes, bad data leads to bad AI. Poor data quality and untapped value trapped in data silos continue to pose production and scalability challenges for generative AI projects.

    In a recent webinar, we spoke to Jean-Luc Chatelain, founder and managing director at Verax Capital Advisors and former CTO of AI at Accenture, about data challenges that stifle generative AI adoption, and how emerging technologies such as Retrieval Augmented Generation (RAG) can help.   

    Generative AI projects start with data readiness 

    ​​Simon McVeigh, who leads K2view’s sales engineering team and hosted the webinar, cited a Gartner study that found only 20% of AI projects, including generative AI, are being deployed at scale. Another 30% are abandoned before reaching production. In both cases, lack of data readiness is one of the top issues derailing generative AI projects. ​​​

    ​​​Why are enterprises still facing data readiness issues when they’ve been prioritizing and investing in data organization for years?​​ ​​​​​​​​​​Jean-Luc outlined 3 main ​reasons​:

    1. Poorly curated data 

      At least 50% of the value of enterprise data is still trapped in both structured and unstructured data due to poor data management. Generative AI project models can’t retrieve data that isn’t curated or maintained correctly. What’s left is incomplete data, often leading to an irrelevant answer or an AI hallucination.  

    2. Rising volumes of transactional data  

      Transactional data – such as service requests, purchases, payments, and invoices – are the life of the enterprise. The volume of transactional data is constantly increasing, which makes data curation and cleansing an ever-challenging feat.  

    3. Manual data preparation 

      Data scientists spend 60-80% of their time preparing data manually and checking its quality. “We’re using very expensive people to serve as data janitors,” said Jean-Luc. This is one of the main reasons most enterprises cannot go beyond POCs and scale AI in production. The cost of having a human in the loop is far too high. 

    ​​Data readiness = data organization + data quality​​​​​ 

    Organizing your data and assuring its quality are 2 important aspects of data readiness. Both have an impact on your RAG architecture, which augments your Large Language Model (LLM) with trusted business entity data. The ability to inject your LLM with real-time customer data, for example, from your company's own sources, results in better AI personalization and more meaningful interactions.  

    LLMs typically use a pre-trained model based on publicly available information from the Internet, explained Jean-Luc. But that doesn’t give you the whole picture. None of these models knows the specifics of your business. By accessing data from your own private data stores, your LLM now has the means to produce more relevant answers.

    Although other methods, like fine-tuning, may also contribute to data readiness, in a retrieval augmented generation vs fine-tuning comparison, RAG proves to be more reliable.

    Enhancing data readiness with RAG 

    Retrieval-augmented generation is a generative AI framework that enhances your LLM by infusing fresh, reliable data from authoritative internal knowledge bases (unstructured data) and enterprise systems (structured data) to generate more informed, context-specific, and dependable responses.

     

    RAG diagram for data readiness​​ RAG turns generic LLMs into business-specific models. 

    The RAG model retrieves, selects, and prioritizes the most pertinent information from the appropriate sources based on the user's query. It then transforms this data into an enriched, contextual prompt to provide an accurate and coherent response to the user.  

    ​​In short, ​​​RAG ​turns your generic LLM into one that knows your business and your customers​. 

    RAG in action 

    Simon gave a simple example of how RAG achieves ROI for generative AI projects and improves the customer experience. Imagine a customer of an Internet Service Provider (ISP) whose Internet is down interacting with the ISP’s chatbot. He tells the bot that his internet isn’t working. The bot, trained on generic data, provides a comprehensive response about how the customer can troubleshoot the issue, such as by restarting his router.  

    This response isn’t technically wrong. It’s one possible solution, after all. The problem is that it’s way too generic in that it lacks context into this specific customer’s situation. Perhaps the customer’s whole region is experiencing an Internet outage. Or maybe he’s late paying his bill.  

    An advanced RAG chatbot would enable the LLM to combine information specific to the enterprise and to the customer to provide a more accurate, up-to-date, and contextually relevant response.  

    ​​Is your data ready for RAG?​​​​​ 

    ​​Here’s ​Jean-Luc​’s take on the top 5 data issues to address RAG conversational AI: ​ ​​​​ 

    1. Dealing with unstructured and structured data 

      In the context of RAG, the focus is usually on unstructured data. However, unstructured data alone cannot provide a complete picture of the enterprise, and it doesn’t account for individual customer situations. According to Jean-Luc, unstructured data provides only half of the answer. Providing the most relevant and valuable responses requires both unstructured and structured (transactional) data. However, ensuring proper data preparation and structured data quality is difficult. 

    2. ​​Conversing in real-time​​​ 

      People expect their conversations with chatbots to feel as natural as talking to a person. Pausing mid-sentence leads to a frustrating customer experience. Providing accurate responses in a dynamic environment requires the ability to access source systems in the RAG pipeline and process transactional data in real time. These constraints often mean companies must choose between speed and accuracy. 

    3. Tearing down silos of information 

      Information silos created by systems of record like ERP, HRM, CRM, and others lead to data fragmentation, which makes enterprise-wide data retrieval difficult. For RAG to work, information stored in enterprise systems must be accessible, of high quality, and include accurate metadata. 

    4. Ensuring data security and privacy 

      Personal Identifiable Information (PII) and other sensitive data are vulnerable to data leaks or malicious attacks. It’s your responsibility to ensure your data is protected during the entire process, by being properly masked, and having the right access controls in place.  

    5. Balancing cost and scalability 

      One reason so many generative AI projects never make it to deployment is because they become too expensive to implement at scale. Often, it ends up being cheaper to continue doing certain processes manually than deploying generative AI. For companies with millions of customers, the cost of supporting a high volume of inbound requests and inferences eliminates any prospect of ROI.

    Maximizing generative AI data readiness 

    The key to successfully deploying GenAI projects and gaining the most possible value from RAG is by maximizing generative AI data readiness. Clean, high-quality, and unified data with the correct metadata plays a major role in avoiding generic responses and providing a satisfactory customer experience. In addition, the data must be secured and available for real-time conversations at scale.

    According to Jean-Luc, there are ​3​ key focus areas for improving data readiness: 

    • Significantly improve data quality automation 

      Ensure data scientists aren’t spending the bulk of their time manually cleaning and preparing data for LLM use. Develop a system for keeping up with data preparation requirements as the volume of transactional data rises.

    • Focus on trustworthiness 

      Data security and privacy must be top-of-mind when planning generative AI projects. As RAG GenAI expands, so do the types and severity of security threats. All sensitive data, from production to usage, must be protected. 

    • Break down data silos  

      Ensure data stored in enterprise systems is accessible and unified, with the correct metadata, so it can be accessed and used to provide context-based responses in real time.  

    Learn more about K2view GenAI Data Fusion,
    the
    RAG tool that makes your data GenAI-ready.

    Achieve better business outcomeswith the K2view Data Product Platform

    Solution Overview

    Ground LLMs
    with Enterprise Data

    Put GenAI apps to work
    for your business

    Solution Overview