Snowflake RAG: When Snowflake meets retrieval-augmented generation

Snowflake RAG, proficient in analytical GenAI with response times measured in minutes, can now support real-time operational GenAI, thanks to K2view.

Implementing RAG with Snowflake in brief

Retrieval-Augmented Generation (RAG) is a design pattern for integrating your organization’s private internal data with the publicly available external data the Large Language Model (LLM) was trained on, to generate more accurate and personalized responses to user queries.

For example, say a frequent flyer asks an airline’s RAG chatbot, “How many frequent flyer credits do I have?” An operational GenAI RAG tool would retrieve all the data related to that customer to generate a protected and precise real-time response, like “Mike, you have 5,000 frequent flyer credits available for use.” An agentic RAG mechanism —sometimes referred to as agentic RAG when combined with retrieval capabilities—might go even further by suggesting, “I notice that, as a rule, you generally like to apply your credits to business-class upgrades. Would you like to apply 2,500 credits to an upgrade on the flight you just reserved?”

RAG tools retrieve relevant business data from data stores like Snowflake and then augment the LLM via contextually enriched prompts. Specifically, Snowflake Cortex implements RAG by:

Leveraging an LLM and a YAML semantic model, which describes the Snowflake data model to translate a user prompt into SQL queries
Executing the queries, to retrieve the needed data from Snowflake
Serving the data to the LLM, in the form of an enriched user prompt to answer the user’s question most effectively

Snowflake RAG can be a drag for operational GenAI

Implementing RAG with Snowflake is an excellent choice for analytical GenAI. For example, answering a sales analyst’s question, like “Compare this year’s Q3 sales results with last year’s, with a breakdown by region, industry, and product” may require several complex queries to be performed that join multiple large tables in Snowflake. This process might take several minutes to execute.

But, when it comes to operational GenAI use cases, such as assisting 500 contact center agents with a conversational AI chatbot, Snowflake would fall short due to the need for:

Conversational latency
Chatbots need to mimic human conversation without any lag time.
Privacy
A customer service agent must only be able to ask questions about the individual customer they’re dealing within a particular session. Maintaining this level of protection in Snowflake introduces incredible complexity because:
- Millions of Snowflake user profiles would have to be individually defined, which is impractical.
- Personally Identifiable Information (PII) and other sensitive data would need to be masked from the agent.
Concurrency
The RAG framework should be capable of supporting hundreds of agents at the same time. Scaling down the Snowflake clusters may be expensive.

We recently conducted a survey among 300 AI pros, spanning many different industries, and found that the 2 leading generative AI use cases are both operational.

Top GenAI use cases-1 Top GenAI use cases

Snowflake RAG is inappropriate for addressing generative AI use cases in customer service, which are now being implemented by more than 50% of B2C organizations.

The key to understanding why Snowflake RAG is unsuitable for operational GenAI can be summed up in speed, security, and cost. More on this in the next section…

Addressing Snowflake RAG challenges for operational GenAI

Most organizations store their raw enterprise data in data lakes like Snowflake Data Cloud. With this data already centralized and easily accessible, it makes perfect sense to leverage it for augmenting an LLM with RAG. LLM agents are implemented to access the structured data in Snowflake, as shown below.

RAG for structured data diagram

Despite the many advantages of data lakes (scalability, adaptability, and storage cost efficiency, to name a few), let’s examine some of their limitations when it comes to implementing an enterprise RAG:

Speed traps
RAG for structured data requires real-time data ingestion which, in turn, depends on the source systems’ streaming capabilities. Additionally, joining data across multiple large Snowflake tables can be time and compute intensive, especially since data lakes aren’t equipped to efficiently take on queries without preconfigured indexes. Finally, data lakes are inappropriate for operational workloads that require split-second conversational AI latency. And, of course, the speed issues are compounded when hundreds of concurrent of users are asking questions at the same time.
To orchestrate LLM prompts, data access policies, and transformation pipelines in a consistent and scalable way, some organizations are now adopting a Model Concept Protocol (MCP). MCP acts as a governance layer between the LLM, the RAG engine, and enterprise data sources like Snowflake—enabling better performance, explainability, and compliance in operational GenAI use cases.
PII in the sky
PII and other sensitive data should never see the light of day, but ensuring AI data privacy in data lakes is very challenging due to the sheer amount of data diversity and access methods. Plus, giving your an LLM access to your entire data lake is not a good idea, since you might accidentally disclose someone else’s data! And setting up a view for each one of your customers is just not practically possible.
When ELT stands for Extremely Long and Tedious
GenAI is only as good as the data feeding it. Data lakes ingest raw data using ELT tools. To achieve data quality for AI, the data must be cleansed and transformed in the data lake itself – requiring extensive data engineering and compute resources and making it an extremely long, tedious, and costly process. 
Other hidden costs
Managing, processing, and securing complex data can be expensive, especially when it’s sourced from multiple enterprise systems. Also, cloud-based data lakes often run on a pay-per-query model, where costs are unpredictable and difficult to control.

Enhancing Data Utilization with Table-Augmented Generation

Table-Augmented Generation (TAG) can play a pivotal role in optimizing how Snowflake environments handle data by enabling access to and interaction with multi-source database tables in real-time. This capability allows organizations to leverage a wide array of data, providing more accurate analytics and insights. Whether it involves integrating customer data from various platforms or compiling operational data for comprehensive reporting, TAG ensures that the data available in Snowflake is used to its fullest potential. TAG's approach improves the accuracy and relevance of data applications, enhancing both operational efficiency and decision-making quality.

K2view provides tunnel vision into Snowflake

K2view GenAI Data Fusion is a semantic data layer, optimized for Snowflake and operational GenAI use cases.

It’s an in-memory cache that dynamically organizes Snowflake data by entities. For example, in customer service the data would be organized by customers, with the data for each customer stored and managed in a “data lake of one”. In such a Micro-Database™, all the data for a specific customer (across all Snowflake tables) is organized as a single unit that can be queried in milliseconds via ANSI SQL.

So, when Jo, an authorized user, asks Robota the chatbot a question, K2view provides the appropriate LLM agent with secure access to all of Jo’s data (and only to Jo’s data) in conversational latency – with the ability to answer any query related to Jo.

With K2view, you can now apply Snowflake RAG to operational GenAI use cases and start benefiting from enhanced cost savings and better user experiences.

This next evolution of enterprise-grade GenAI—powered by agentic RAG and structured through a robust Model Concept Protocol (MCP)—is finally within reach.

Just one thing: When you call Snowflake about deploying Cortex RAG in your call center, be sure to tell them K2view sent you :)

Learn how K2view GenAI Data Fusion extends
Snowflake RAG into the realm of operational GenAI.

Overview

Capabilities

Architecture

Data Privacy and Compliance

Data for Generative AI

Data Integration

Company

Reach Out

News Updates

Resources

Education & Training

Demo

Snowflake RAG: When Snowflake meets retrieval-augmented generation

Iris Zarecki,Product Marketing Director

In this article

Snowflake RAG: When Snowflake meets retrieval-augmented generation

More on this topic

Learn how to ground GenAI apps with enterprise data

Table of Contents

Implementing RAG with Snowflake in brief

Snowflake RAG can be a drag for operational GenAI

Addressing Snowflake RAG challenges for operational GenAI

Enhancing Data Utilization with Table-Augmented Generation

K2view provides tunnel vision into Snowflake

Achieve better business outcomeswith the K2view Data Product Platform

Learn how to ground GenAI apps with enterprise data

Overview

Capabilities

Architecture

Data Privacy and Compliance

Data for Generative AI

Data Integration

Company

Reach Out

News Updates

Resources

Education & Training

Demo

See Agentic AI in Action

Start your live product tour

Snowflake RAG: When Snowflake meets retrieval-augmented generation

Iris Zarecki,Product Marketing Director

In this article

Snowflake RAG: When Snowflake meets retrieval-augmented generation

More on this topic

Learn how to ground GenAI apps with enterprise data

Table of Contents

Implementing RAG with Snowflake in brief

Snowflake RAG can be a drag for operational GenAI

Addressing Snowflake RAG challenges for operational GenAI

Enhancing Data Utilization with Table-Augmented Generation

K2view provides tunnel vision into Snowflake

Achieve better business outcomeswith the K2view Data Product Platform

Related articles for you

Unleashing the power of agentic AI: K2view launches Data Agent Builder

Generative AI Data Augmentation: An IDC Research Snapshot

Snowflake data masking: Overcoming the challenges with K2view

Learn how to ground GenAI apps with enterprise data