AI hallucinations are AI-generated outputs that are factually incorrect, nonsensical, or inconsistent, due to bad training data or misidentified patterns.
01
The problem with AI hallucinations
Generative AI (GenAI) applications are impressive tools capable of generating human-quality text, translating languages, composing music, and much, much more. But a GenAI app can sometimes stumble into the twilight zone, producing spurious or silly responses – known as an AI hallucinations. Such outputs are incorrect, nonsensical, and/or inconsistent.
There are 2 primary reasons why AI hallucinations exist.
The first culprit is the data the Large Language Model (LLM) was trained on. Just like a student relying on dated textbooks, an LLM trained on a faulty or incomplete dataset will reflect those flaws in its responses. For instance, an AI chatbot trained on a dataset of historical articles filled with factual errors might mistakenly place the Battle of Waterloo in Tahiti.
The second reason is how AI learns. Sometimes, AI models identify patterns in their training data that aren't truly connected – in a process called overfitting. Overfitting occurs when an LLM is trained “too well” on its training data, to the point where it captures the noise and specific details of that data, instead of the underlying distribution. As a result, the LLM performs well on training data, but poorly on new data because it hasn’t yet learned to generalize.
Like a chef who can prepare certain dishes incredibly well but can’t create new ones, the LLM becomes great at “regurgitating” the training data but can’t innovate, or adapt to new inputs.
The consequences of AI hallucinations can vary in severity. On the lighter side, they might simply be amusing. However, they can also have serious implications, such as an AI-based diagnostic model that misdiagnoses a condition based on the faulty analysis of an x-ray.
Just how prevalent are AI hallucinations? One AI startup created a model that continuously evaluates how often leading LLMs introduce hallucinations when summarizing a document. The current findings are stark: mainstream LLMs are quite hallucinatory, with Google Gemini hallucinating nearly 5% of the time and ChatGPT over 3% on average. Some lesser-known models hallucinate nearly 17% of the time.
Researchers are actively working on solutions. One approach involves developing better training methods. By grounding LLMs with higher quality, more diverse data, we can reduce the chances of them learning incorrect information or making spurious connections. In addition to LLM grounding, researchers are creating fact-checking mechanisms that compare GenAI outputs against real-world knowledge bases to identify and flag inconsistencies.
In the corporate world, a GenAI framework called Retrieval-Augmented Generation (RAG) grounds a company’s LLM with structured and unstructured data from its own private sources.
A RAG LLM reduces AI hallucinations and delivers more meaningful, personalized, and secure responses than ever before.
02
What causes AI hallucinations?
As mentioned above, AI hallucinations stem from the way an LLM learns. The model is trained on vast amounts of data, allowing it to identify patterns and connections between words and concepts. The problem lies in one core LLM function: statistically predicting the most likely word to follow in a sequence. This statistical task can lead LLMs to create content that sounds right but isn’t. These deviations can be frustrating and misleading, so understanding the root causes is crucial. Here's a deep dive into the key factors contributing to AI hallucinations:
-
Overfitting and random correlations
LLMs are trained on massive datasets using various statistical methods. During training, the model can prioritize memorizing specific patterns in the data to improve short-term accuracy. This practice can lead to overfitting, as discussed in the previous section. Further, statistical analysis may identify correlations between variables that aren't causally connected. An LLM might analyze a an airline’s dataset of ticket purchases and refunds, and mistakenly conclude that all flight tickets purchased online are refundable when, in fact, they’re not.
-
Limitations in training data
Real-world data is messy and incomplete. If an LLM is trained on a dataset that lacks crucial information or diverse perspectives, it might have trouble representing reality accurately. For example, if an LLM was trained solely on English, it might have difficulty recognizing words like smorgasbord. What’s more, training data can be noisy, containing errors or biases that the LLM unknowingly absorbs. For instance, a sentiment analysis model trained on AI data from social media sources might overfit to negative language, leading it to misinterpret neutral statements as being negative.
-
Issues with common sense
While LLMs excel at pattern recognition and statistical analysis, they lack basic common sense. Imagine the latest RAG conversational AI app tasked with writing a story set in the 1960s. It might generate a scenario where a character uses a cellphone, which, of course, wasn’t invented yet!
-
Evaluating output relevance
Evaluating the quality and factual grounding of AI outputs can be challenging. For example, an employee querying a company chatbot about vacation time won’t find much satisfaction from a response based on publicly available data. And, while a RAG chatbot could provide a generalized response (based on unstructured data in the form of company policy docs), it could only get personal (“You’re entitled to 14.5 vacation days”) when the LLM was grounded with structured data – at the individual employee level.
03
Examples of AI hallucinations
AI, for all its advancements, can sometimes land on Fantasy Island. For example, an AI or RAG hallucination can range from the humorous to the potentially harmful. From misplaced historical events to gravity-defying architecture, here are 10 real-world AI hallucination examples where GenAI got a little carried away, to say the least:
-
Historical hiccups
A user asked a GenAI app for a detailed account of the Roman emperor Julius Caesar's assassination. The response wove a thrilling narrative, but mistakenly placed the event in ancient Egypt instead of Rome.
-
Culinary concoctions
An AI recipe generator proposed a mouthwatering dish featuring an interesting ingredient: "luminescent dragonfruit". The fruit, of course, doesn't exist, but sounds exotic and could have easily tricked someone into trying the recipe regardless.
-
Scientific sensationalism
A science news chatbot reported on a groundbreaking discovery in physics, claiming scientists achieved perpetual motion – a concept that defies the laws of thermodynamics. The sensational claim grabbed attention but was clearly false.
-
Lost land legends
A GenAI app tasked with writing travel articles described a mystical lost land hidden in the Amazon rainforest, complete with fantastical flora and fauna. The story was captivating but was a complete fabrication based on myths and legends.
-
Financial fantasy
An stock trading bot, analyzing market trends, recommended investing heavily in a new public company with a revolutionary product – a product, that on further investigation, turned out to be entirely fictitious.
-
Star scandals
An AI-based tabloid churned out a juicy story about a famous singer's secret marriage to a Martian prince. The outlandish tale spread like wildfire on social media before being taken down (since there are no Martian princes).
-
Legal loophole lies
A legal research AI tool, combing through vast amounts of case law, thankfully threw out a fabricated Supreme Court decision supporting a specific legal argument. The fake case could have misled lawyers and potentially disrupted the legal process.
-
Musical mishaps
An AI-based music composer tasked with creating a cover in the style of a famous artist generated a nonsensical song that combined elements from completely different genres, a mega = musical mixup.
-
Architectural anomalies
An AI-based urban planning app proposed a revolutionary skyscraper design that defied the laws of physics, appearing to float in mid-air. While visually stunning, the design, of course, was structurally impossible.
-
Linguistic lapses
An AI-based translator couldn’t figure out a Korean idiom and moved it – as is – into another language, resulting in a flagrantly foul phrase.
04
The consequences of AI hallucinations
AI hallucinations, while amusing at times, can also result in far-reaching and serious consequences, depending on the context in which they occur. Here's a breakdown of the potential dangers of AI hallucinations:
-
Misinformation dissemination
AI hallucinations can be used to spread misinformation. Consider manipulated social media posts or news articles containing fake news that appears believable. Such articles could easily be used to cause confusion, erode trust in reliable sources, and manipulate public opinion.
-
Wasted resources
In business settings, AI hallucinations can lead to wasted resources based on bad data. For instance, an AI-based marketing tool might recommend a campaign strategy targeting the wrong demographics due to inaccurate, incomplete data.
-
Erosion of trust
Repeated exposure to AI hallucinations can erode trust in the very concept of artificial intelligence. If users get bad info from chatbots or other AI assistants, they probably won’t rely on them for any task.
-
Cyber attacks
In critical applications like self-driving vehicles, AI hallucinations could have life-or-death consequences. Imagine an autonomous taxi thinking one-way street is two-way, as a result of outdated road maps or GPS information .
-
Amplification of bias
AI hallucinations can amplify existing biases present in training data, leading to discriminatory practices if not addressed. For instance, an AI tool used in recruitment might favor candidates from certain backgrounds due to biases in its training data.
05
The upside of AI hallucinations
While AI hallucinations can be frustrating and misleading, some argue that they also have a positive side in that they can:
-
Spark creativity
Since an AI hallucination often generates unexpected ideas or scenarios, they can challenge us to think outside the box and explore new possibilities.
-
Discover biases in training data
The nonsensical details AI models sometimes make up can be a wake-up call to address these biases and improve the quality of training data.
-
Safeguard against super-intelligence
The unreliability of AI models due to hallucinations prevents them from “taking over the world” – and gives us time to develop safeguards and regulations for AI as it evolves.
-
Ensure job security for real people
The need for a human in the loop due to hallucinations keeps some jobs relevant that might otherwise be automated. This provides a buffer zone as AI capabilities develop.
While AI researchers should, of course, strive to eliminate hallucinations, we can still pause to appreciate their potential benefits.
06
How to spot AI hallucinations
Large Language Models (LLMs) are impressive tools, capable of generating human-quality text and completing complex tasks but they're not perfect. AI hallucinations can be convincing, so staying vigilant is crucial. Be on the lookout for:
-
Contextually inconsistent answers
A well-constructed response should flow logically and connect to the information provided. Look for inconsistencies. Does the AI-generated response introduce irrelevant details or veer off-topic? Does it contradict established facts or details mentioned earlier in the conversation? If the answer seems like it belongs in a different story altogether, it might very well be an AI hallucination.
-
Statistically implausible data
LLMs are trained on massive datasets, and sometimes they rely on statistical probabilities to fill in the gaps. This practice can lead to answers that are technically possible but highly unlikely. For example, if you ask an LLM to write a biography of a historical figure, it might include awards or achievements that the person never received. While statistically there's a chance anyone could win any award, the specific details should align with the figure's known history.
-
Emotional manipulation
Be wary of responses that tug at your heartstrings or try to evoke a strong emotional response. While AI can learn to mimic human emotions, it might not understand the nuances of emotional context. An overly sentimental story or a dramatic turn of events could be a sign of fabrication.
-
No references or sources
When providing “known” facts, a reliable AI model should be able to provide some basis for its claims. If you're asking about historical events, scientific discoveries, or specific data points, a legitimate answer should reference credible sources. An LLM that simply states a fact without any source might be hallucinating.
-
Focus on trivia
Although RAG GenAI may be great at mimicking human speech patterns, pay close attention to the content. Does the answer include overly specific or technical details that seem out of place? For instance, an LLM referring to a specific customer refund might add obscure details about the kind of container the goods were shipped in. These overly specific flourishes can be a sign of fabrication.
By recognizing these warning signs, you’ll become a more discerning user of GenAI content. Critical thinking is key to separating fact from fiction in the world of AI.
07
How to prevent AI hallucinations
Researchers are actively exploring several key approaches to create a future where LLMs are more grounded in reality. To eliminate AI hallucinations, AI scientists have their eyes on:
-
Data quality
The foundation for reliable AI output lies in the data an LLM is trained on. Thus, researchers are prioritizing high-quality data. This involves techniques like data cleaning to remove errors and inconsistencies, and bias filtering to ensure that the LLM learns from a balanced and truthful perspective. The same way a student will perform better when trained on accurate, up-to-date information, the LLM will develop a stronger grasp of reality when fed trusted, fresh data.
-
Fact-checking
Even the best data can have gaps, so this is where fact-checking mechanisms come into play. As the LLM generates text, these mechanisms compare the output against real-world knowledge bases like scientific publications or verified news articles. For example, consider a RAG LLM responding to a customer query. The fact-checking system would verify details like dates, figures, and locations found in the customer’s private data against publicly available information. Inconsistencies would get flagged, prompting the LLM to revise its response based on verified information.
-
Fine-tuning
Beyond data cleansing and fact checking, researchers are also focusing on teaching LLMs to reason better. While current RAG AI models excel at pattern recognition and statistical analysis, fine-tuning would allow them to better judge the reliability of their responses. For example, a cellular operator’s chatbot tasked with providing personalized recommendations to users would employ fine-tuning to identify nonsensical elements, like the user in question subscribing to more than one plan at the same time.
-
Transparency by design
Techniques are being developed to show users the sources the LLM used to generate its response. Sourcing allows users to assess the trustworthiness of the information and identify potential biases in the training data. This transparency allows users to evaluate the information critically and understand the LLM's reasoning process.
-
Active RAG
A promising approach called Active Retrieval-Augmented Generation (ARAG) provides factual grounding for AI outputs from trusted sources. ARAG uses your private enterprise systems to find relevant information that supplements the LLM's knowledge. This functionality allows the LLM to anchor its responses in real-world data, reducing the risk of fabricated or nonsensical creations.
By combining these approaches, researchers are working towards a future where AI-generated content is grounded. High-quality data, fact-checking, improved reasoning, transparency, and active RAG all play a role in making AI a more reliable and trustworthy resource.
08
What the LLMs are saying about hallucinations
Let’s let the LLMs speak for themselves. We asked Google Gemini, OpenAI ChatGPT, and Microsoft Copilot a simple question: "Have you ever provided incorrect information?"
Their responses reveal the fascinating inner workings of AI and the factors that can lead to an AI or RAG hallucination:
Google Gemini
"It's possible that I have unintentionally provided incorrect information in the past. While I am designed to avoid hallucinations, here's why something inaccurate might have slipped through:
-
Data Shortcomings: My training data, though vast, might have contained errors or biases. If I rely on that information to answer your query, my response could be misleading.
-
Prompt Misinterpretation: Sometimes I might misinterpret your question or miss subtle nuances in your phrasing. This could lead me down the wrong path and generate a response that seems plausible but isn't quite accurate.
-
Statistical Flukes: In rare cases, I might generate an answer that appears factually correct based on the statistical patterns in my training data, but is actually incorrect in the real world.
While I can't say for certain if I've ever provided incorrect information due to these factors, I am constantly being improved. Here's what you can do to help:
-
Ask Clear Questions: The more specific and well-defined your question is, the better I can understand your intent and provide an accurate answer.
-
Fact-Check My Responses: Don't assume everything I say is true. Double-check the information I provide with reliable sources, especially for critical topics.
-
Provide Feedback: If you notice that I've given you incorrect information, let me know! Your feedback helps me learn and improve over time.
By working together, we can ensure that I am a reliable source of information."
OpenAI ChatGPT
"As an AI language model, my responses are generated based on patterns and data from the vast corpus of information I've been trained on. While I don't experience hallucinations or subjective phenomena, there can be instances where the information provided might be inaccurate or misleading due to various factors.
One potential factor is the quality and completeness of the data I've been trained on. While efforts are made to ensure that the training data is diverse and representative, it's impossible to include every possible scenario or piece of information. As a result, there may be gaps or biases in the data that could lead to incorrect responses.
Additionally, the context of the input I receive can influence the accuracy of my responses. If the input is ambiguous or unclear, I might generate a response that doesn't accurately reflect the intended meaning. Similarly, if the input contains incorrect information or is based on faulty assumptions, my response may also be incorrect.
Furthermore, errors in the processing or interpretation of the input can also lead to inaccuracies in my responses. While I strive to interpret and process the input as accurately as possible, there can be instances where mistakes occur, leading to incorrect or nonsensical outputs.
It's essential to approach information provided by any AI with a critical mindset and to cross-reference it with reliable sources to ensure accuracy. If you ever come across any inaccuracies or inconsistencies in my responses, please feel free to point them out, and I'll do my best to provide clarification or correct the information."
Microsoft Copilot
"As an AI language model, I strive to offer accurate and helpful answers based on the vast amount of data and patterns I've learned during my training. However, I am not infallible, and there may be instances where my responses are less accurate or contain errors. These inaccuracies can arise due to various factors:
-
Training Data Limitations: My knowledge is derived from the text data available during my training, which was up to a certain point in 2021. If new information has emerged since then, I might not be aware of it.
-
Ambiguity and Context: Language is inherently ambiguous. Sometimes, context or phrasing can lead to misunderstandings. While I try to infer context, there are cases where I might misinterpret the user's intent.
-
AI Hallucinations: Occasionally, I might generate responses that seem plausible but lack factual accuracy. These "hallucinations" occur due to the creative nature of AI models, where they extrapolate from existing data to create new content.
-
Complex Topics: Some topics are intricate and multifaceted. Providing concise yet accurate information can be challenging, especially when dealing with scientific, legal, or historical matters.
-
Dynamic World: The world is constantly changing. New discoveries, events, and developments occur daily. Keeping up with real-time information is difficult for any model.
Despite these challenges, I continuously learn and adapt. Feedback from users helps me improve, and I appreciate your understanding. If you ever notice any inaccuracies, please feel free to point them out, and I'll do my best to correct them.
In summary, while I strive for accuracy, I encourage critical thinking and verification from reliable sources. After all, even the most advanced AI models have their limitations, much like the imperfections of human knowledge."
09
Reducing AI hallucinations with GenAI Data Fusion
A great way to reduce AI hallucinations is by using the most advanced RAG tool, one that accesses/supplements both unstructured and structured data from your own private data sources.
This approach, called GenAI Data Fusion, unifies the structured data of a single business entity – customer, employee, or invoice – from enterprise systems based on a data-as-a-product approach.
Data products enable GenAI Data Fusion to access real-time data from multiple enterprise systems, not just static docs from knowledge bases. With this feature, LLMs can leverage RAG to integrate data from your customer 360 platform, and turn it into contextual prompts. The prompts are fed into the LLM together with the user’s query, enabling the LLM to generate a more accurate and personalized response.
K2View’s data product platform lets RAG access data products via API, CDC, messaging, or streaming – in any variation – to unify data from many different source systems. A data product approach can be applied to various RAG use cases to:
-
Resolve problems quicker.
-
Institute hyper-personalized marketing campaigns.
-
Personalize up-/cross-sell suggestions.
-
Identify fraud by detecting suspicious activity in user accounts.
Reduce AI hallucinations with the market-leading
RAG tool – GenAI Data Fusion by K2view.
AI hallucination FAQs
What are AI hallucinations?
AI hallucinations are generative AI responses that are untrue, nonsensical, or inconsistent – because of flawed training data or poorly identified patterns.
What causes AI hallucinations?
AI hallucination stem from LLM learning patterns. The LLM is trained on huge volumes of data, enabling it to identify connections and patterns between concepts and words. The problem boils down to an LLM’s ability to predict the most likely word to follow in a sequence – sometimes leading it to create content that sounds right but is actually wrong.
What are some examples of AI hallucinations?
Here are 10 real-world examples of AI hallucinations:
- Architectural anomalies
An AI-assisted urban planning app designing a skyscraper that floats in mid-air. - Culinary concoctions
An AI-based recipe maker presenting a dish featuring a nonexistent "luminescent dragonfruit". - Financial fantasy
A stock broker bot recommending a new public company with an entirely fictitious product. - Historical hiccups
An AI-based historical app mistakenly placing Julius Caesar in ancient Egypt instead of Rome. - Legal loophole
An AI-based legal research assistant discovering and red-flagging a fabricated Supreme Court decision. - Linguistic lapses
An AI-based translator placing a Korean term as is into another language, resulting in a swear word. - Lost land legends
A generative AI app that writes travel articles describing a mystical lost land hidden in the Amazon rainforest. - Musical mishaps
An AI-based music composer mismatching elements from different periods and styles. - Scientific sensationalism
A news chatbot claiming that scientists achieved perpetual motion – a concept that defies the laws of physics. - Star-studded scandals
An AI-based tabloid reporting on a famous singer's secret marriage to a Martian prince.
What are the consequences of AI hallucinations?
Here are the consequences of AI hallucinations:
- Bias amplification
An AI hallucination can amplify existing biases found in training data, leading to poor practices if not addressed. - Cyber attacks
AI hallucinations could have life-or-death consequences in critical applications like self-driving vehicles. - Misinformation dissemination
Imagine manipulated social media posts of fake news designed to cause confusion and manipulate public opinion. - Trust erosion
If you get bad info from a company chatbot, you won’t be using it again. - Wasted resources
Consider an AI-based marketing tool recommending a strategy targeting the wrong demographics due to inaccurate, incomplete data.
How do you spot AI hallucinations?
AI hallucinations are convincing by design, so beware of:
- Contextually inconsistent answers
Look for inconsistencies. Does the response include irrelevant details or go off-topic? Does it contradict facts or any of the details discussed earlier in the conversation? - Emotional manipulation
While a generative AI app can try to sound emotional, it lacks the contextual nuances of emotion. An overly dramatic or sentimental response might be a sign of fabrication. - No references or sources
When queried about historical events, scientific inventions, or specific data points, the answer should include credible sources – otherwise it might be an hallucination. - Statistically implausible data
Because LLMs are trained on massive amounts of data, they often resort to statistical probabilities to fill in any gaps – leading to answers that may be possible but are unlikely. - Trivial pursuits
Even if a generative AI app can mimic human speech patterns well, are its outputs direct and true? Or do the responses over-specify or focus on details that appear out of place?
Critical independent thinking separates fact from fiction in the world of generative AI.
How do you eliminate AI hallucinations?
To prevent AI hallucinations, AI researchers are looking at:
- Data quality
AI scientists are focusing on improving data quality, through more comprehensive data cleansing and better bias filtering techniques. - Fact-checking
Fact-checking mechanisms compare AI responses to real-life knowledge bases such as scientific publications, verified news articles, and company information. - Fine-tuning
While current generative AI apps may be great at pattern recognition and statistical analysis, fine-tuning allows them to better judge the reliability of their responses. - Transparency by design
Referencing and sourcing let users assess the value and truth of the information as well as detect possible biases in the training data. - Active RAG
Active Retrieval-Augmented Generation grounds LLMs in trusted data – from publicly available and private company sources – to drastically reduce hallucinations.
Better quality data, fact-checking, fine-tuning, transparency, and active RAG combine to make AI a more reliable and trustworthy resource.