Was Beeb Birtles born in Amsterdam?

To answer the question factually, yes: Beeb Birtles, the Australian musician, singer, songwriter and guitarist, was born in Amsterdam on November 28, 1949. But if you ask an LLM you may very well get a wrong answer. For example this is what LLMSynthesize produces when using the OpenAI LLM:

In[]:=

Table[LLMSynthesize["Was Beeb Birtles born in Amsterdam"],5]//Column

Out[]=

No, Beeb Birtles was not born in Amsterdam. He was born on October 28, 1948, in the city of Melbourne, Australia. He is known as a musician and a member of the Australian band Little River Band.

No, Beeb Birtles was not born in Amsterdam. He was born on 28th March 1948 in Melbourne, Australia. He is known as a musician, particularly as a member of the band Little River Band.

No, Beeb Birtles was not born in Amsterdam. He was born on June 28, 1948, in Georgina, Ontario, Canada. He is known for being a musician and a member of the Australian band Little River Band.

No, Beeb Birtles was not born in Amsterdam. He was born in Melbourne, Australia, on September 28, 1948. He is known as a musician and a member of the band Little River Band.

No, Beeb Birtles was not born in Amsterdam. He was born in Melbourne, Australia, on June 28, 1948. Birtles is known as a musician and a member of the Australian band Little River Band.

Note that the results are consistently wrong, and that one answer even claims a birth place of Georgina, Ontario, Canada. The problem is that LLMs are not (yet) encoding all the facts they encounter during training, and this is a good example of a gap in its knowledge. Beeb Birtles (birth name Gerard Bertelkamp) was born in Amsterdam but emigrated to Australia in September 1959 and has lived in Australia ever since. These sort of detailed facts are however available to any Wolfram Language user via the carefully curated Wolfram Knowledgebase.

In[]:=

Beeb Birtles

PERSON

[{"BirthPlace","BirthDate"}]

Out[]=



Amsterdam

Sun 28 Nov 1948



To address this very common shortcoming on factual knowledge in LLMs, people have invented methods named Retrieval Augmented Generation (or more commonly RAG). A RAG is a technique that enhances the capabilities of language models by integrating external data retrieval processes. This means that instead of relying solely on pre-encoded information within the model, a RAG system dynamically fetches relevant facts from databases or other reliable sources to generate more accurate responses. This approach helps in bridging the gap between static knowledge encoded in a model’s parameters and the ever-evolving body of human knowledge.

These RAG methods examine the user prompt, in this case the question “Was Beeb Birtles born in Amsterdam” and try to retrieve facts from various sources. This can be a semantic search index, like the Semantic Index of Wolfram Language Documentation which indexes (semantically) the documentation for the Wolfram Language. But using the Knowledgebase is a also a great way to improve LLMSynthesize results. And the best way to use the Knowledgebase with LLMSynthesize is with the LLMPromptGenerator functionality.

Here is a simple example that shows how LLMPromptGenerator can improve the results of LLMSynthesize. Without LLMPromptGenerator a question like “what day is today?” is impossible the answer, since this sort of question can not be statically encoded into it neural network, so you typically get a random date from the past:

In[]:=

LLMSynthesize["what day is today?"]

Out[]=

Today is October 28, 2023.

Since I did not actually evaluate this input back in 2023 this result is clearly incorrect. But with LLMPromptGenerator you can give LLMSynthesize some help by providing it with up to date information:

In[]:=

generator=LLMPromptGenerator[{"Current date/time: ",DateString[]}&]LLMSynthesize["what day is today?",LLMEvaluator-><|"Prompts"->generator|>]

Out[]=

LLMPromptGenerator

Input Specification:Input

Function:{Current date/time: ,DateString[]}&



Out[]=

Today is Friday, December 20, 2024.

Which corresponds correctly to the date this input was evaluated.

We can use a slightly more sophisticated version of this LLMPromptGenerator to help LLMSynthesize give the correct answer for the birth place of Beeb Birtles. The first step is to run the user prompt through TextCases, which scans a piece of text for possible text content types known to the Knowledgebase. In this case I ask TextCases to look for possible “City” and “Person” text content types, which are only two of hundreds of possible entity types. I then ask TextCases to interpret these city and persons as entities. What you end up with is the following:

In[]:=

assoc=TextCases["Was Beeb Birtles born in Amsterdam",{"City","Person"}->"Interpretation"]

Out[]=

City

Amsterdam

,Person

Beeb Birtles



This is great because it found exactly the entities that are useful for answering this question.

For all the entities that were found you can now get a full Dataset (or Association as well). Note how this data includes the date and place of birth for Beeb Birtles:

In[]:=

EntityValue[Flatten@Values[assoc],"Dataset"]

Out[]=

Amsterdam	active home listings	—
	administrative region	North Holland, Netherlands
	number of aggravated assaults	—
	rate of aggravated assault	—
	aggregate home value	—
	aggregate home value, householder 15 to 24 years	—
	aggregate home value, householder 25 to 34 years	—
	aggregate home value, householder 35 to 64 years	—
	aggregate home value, householder 65 years and over	—
	aggregate household income	—
	110 total ›
Beeb Birtles	other names	—
	astrological sign	Sagittarius
	date of birth	Sun 28 Nov 1948
	place of birth	Amsterdam
	brothers	—
	place of burial	—
	children	—
	Chinese zodiac sign	Rat (Yang Earth)
	daughters	—
	date of death	—
	49 total ›

We can now create a slightly more intricate prompt generator to help with answering the question more correctly. I have added two Echo statements which will show in more detail what goes on when LLMSynthesize actually uses this prompt generator:

In[]:=

generator=LLMPromptGenerator[Function[Echo[#];With[{assoc=Echo@TextCases[#Input,{"City","Person"}->"Interpretation"]},EntityValue[Flatten@Values[assoc],"Association"]]],{"Input"}]

Out[]=

LLMPromptGenerator

Input Specification:{Input}

Function:(Echo[#1];With[{assoc=Echo[TextCases[#Input,{City,Person}Interpretation]]},EntityValue[Flatten[Values[assoc]],Association]])&



And with this prompt generator, all of a sudden LLMSynthesize gives a correct answer! Note that the prompt generator function receives an association with the original question as the “Input” key. This input is then used with TextCases and finally entity values are fed, as an association, to LLMSynthesize:

In[]:=

LLMSynthesize["Was Beeb Birtles born in Amsterdam",LLMEvaluator-><|"Prompts"->generator|>]

InputWas Beeb Birtles born in Amsterdam

City

Amsterdam

,Person

Beeb Birtles



Out[]=

Yes, Beeb Birtles was born in Amsterdam.

To summarize, using the Wolfram Knowledgebase in conjunction with tools like LLMPromptGenerator can significantly enhance the accuracy of responses generated by language models. By supplementing the model’s existing capabilities with reliable data retrieval methods, we can address gaps in factual knowledge effectively. This approach not only improves the model’s performance on specific queries but also showcases the synergy between AI language models and structured data resources. In practice, integrating these tools allows users to leverage the strengths of both systems, ensuring more reliable and accurate outputs.

CITE THIS NOTEBOOK

Combatting LLM "fibs" with fact-injection from the Wolfram Knowledgebase
by Arnoud Buzing
Wolfram Community, STAFF PICKS, December 20, 2024
https://community.wolfram.com/groups/-/m/t/3342675