Snowflake RAG: How to use Snowflake and Retrieval-Augmented Generation for Operational GenAI

Written by Iris Zarecki | February 25, 2025

Snowflake RAG, proficient in analytical GenAI with response times measured in minutes, can now support real-time operational GenAI, thanks to K2view.

Implementing RAG with Snowflake in brief

Retrieval-Augmented Generation (RAG) is a design pattern for integrating your organization’s private internal data with the publicly available external data the Large Language Model (LLM) was trained on, to generate more accurate and personalized responses to user queries.

For example, say a frequent flyer asks an airline’s RAG chatbot, “How many frequent flyer credits do I have?” An operational GenAI RAG tool would retrieve all the data related to that customer to generate a protected and precise real-time response, like “Mike, you have 5,000 frequent flyer credits available for use.” An agentic AI mechanism might go even further by suggesting, “I notice that, as a rule, you generally like to apply your credits to business-class upgrades. Would you like to apply 2,500 credits to an upgrade on the flight you just reserved?”

RAG tools retrieve relevant business data from data stores like Snowflake and then augment the LLM via contextually enriched prompts. Specifically, Snowflake Cortex implements RAG by:

Leveraging an LLM and a YAML semantic model, which describes the Snowflake data model to translate a user prompt into SQL queries
Executing the queries, to retrieve the needed data from Snowflake
Serving the data to the LLM, in the form of an enriched user prompt to answer the user’s question most effectively

Snowflake RAG can be a drag for operational GenAI

Implementing RAG with Snowflake is an excellent choice for analytical GenAI. For example, answering a sales analyst’s question, like “Compare this year’s Q3 sales results with last year’s, with a breakdown by region, industry, and product” may require several complex queries to be performed that join multiple large tables in Snowflake. This process might take several minutes to execute.

But, when it comes to operational GenAI use cases, such as assisting 500 contact center agents with a conversational AI chatbot, Snowflake would fall short due to the need for:

Conversational latency
Chatbots need to mimic human conversation without any lag time.
Privacy
A customer service agent must only be able to ask questions about the individual customer they’re dealing within a particular session. Maintaining this level of protection in Snowflake introduces incredible complexity because:
- Millions of Snowflake user profiles would have to be individually defined, which is impractical.
- Personally Identifiable Information (PII) and other sensitive data would need to be masked from the agent.
Concurrency
The RAG framework should be capable of supporting hundreds of agents at the same time. Scaling down the Snowflake clusters may be expensive.

We recently conducted a survey among 300 AI pros, spanning many different industries, and found that the 2 leading generative AI use cases are both operational.

Top GenAI use cases

Snowflake RAG is inappropriate for addressing generative AI use cases in customer service, which are now being implemented by more than 50% of B2C organizations.

The key to understanding why Snowflake RAG is unsuitable for operational GenAI can be summed up in speed, security, and cost. More on this in the next section…

Addressing Snowflake RAG challenges for operational GenAI

Most organizations store their raw enterprise data in data lakes like Snowflake Data Cloud. With this data already centralized and easily accessible, it makes perfect sense to leverage it for augmenting an LLM with RAG. LLM agents are implemented to access the structured data in Snowflake, as shown below.

Despite the many advantages of data lakes (scalability, adaptability, and storage cost efficiency, to name a few), let’s examine some of their limitations when it comes to implementing an enterprise RAG:

Speed traps
RAG for structured data requires real-time data ingestion which, in turn, depends on the source systems’ streaming capabilities. Additionally, joining data across multiple large Snowflake tables can be time and compute intensive, especially since data lakes aren’t equipped to efficiently take on queries without preconfigured indexes. Finally, data lakes are inappropriate for operational workloads that require split-second conversational AI latency. And, of course, the speed issues are compounded when hundreds of concurrent of users are asking questions at the same time.
PII in the sky
PII and other sensitive data should never see the light of day, but ensuring AI data privacy in data lakes is very challenging due to the sheer amount of data diversity and access methods. Plus, giving your an LLM access to your entire data lake is not a good idea, since you might accidentally disclose someone else’s data! And setting up a view for each one of your customers is just not practically possible.
When ELT stands for Extremely Long and Tedious
GenAI is only as good as the data feeding it. Data lakes ingest raw data using ELT tools. To achieve data quality for AI, the data must be cleansed and transformed in the data lake itself – requiring extensive data engineering and compute resources and making it an extremely long, tedious, and costly process. 
Other hidden costs
Managing, processing, and securing complex data can be expensive, especially when it’s sourced from multiple enterprise systems. Also, cloud-based data lakes often run on a pay-per-query model, where costs are unpredictable and difficult to control.

K2view provides tunnel vision into Snowflake

K2view GenAI Data Fusion is a semantic data layer, optimized for Snowflake and operational GenAI use cases.

It’s an in-memory cache that dynamically organizes Snowflake data by entities. For example, in customer service the data would be organized by customers, with the data for each customer stored and managed in a “data lake of one”. In such a Micro-Database™, all the data for a specific customer (across all Snowflake tables) is organized as a single unit that can be queried in milliseconds via ANSI SQL.

So, when Jo, an authorized user, asks Robota the chatbot a question, K2view provides the appropriate LLM agent with secure access to all of Jo’s data (and only to Jo’s data) in conversational latency – with the ability to answer any query related to Jo.

With K2view, you can now apply Snowflake RAG to operational GenAI use cases and start benefiting from enhanced cost savings and better user experiences.

Just one thing: When you call Snowflake about deploying Cortex RAG in your call center, be sure to tell them K2view sent you :)

Learn how K2view GenAI Data Fusion extends
Snowflake RAG into the realm of operational GenAI.

View full post