In his article ‘Generative AI Space and the Mental Imagery of Alien Minds’ , Stephen writes that concepts can be thought of as islands in a large space “and between these islands there lie huge expanses of what we might call ‘inter-concept space’”. My project focuses on exploring these spaces in a linguistic context, essentially investigating that if we consider two pieces of text as points in space, what lies between them- and does it hold any meaning? The inspiration for the project is the process of ‘text diffusion’, wherein a clean text is corrupted and a model is trained to return the original text when provided a corrupted one. Through my investigation, I found that while it’s quite hard to obtain the text that lies in the exact space between two other texts, it is possible to manually combine the texts in different combinations- and in that way understand what lies between them.

Introduction

The general aim of my research was to explore the space that is created from blending two different pieces of text. The investigation began with Semantic Search Indices, which I used to assign vectors to all pieces of writing I worked with. I began with the simplest way of creating inter-concept space, by blending the vectors embeddings of two paragraphs from an article using weighted sum to create a new one. An existing paragraph that corresponded most closely to the new paragraph found using the Nearest Neighbours function became the foundation of generating new text, which was done in 3 ways. I applied Markov Chain on the existing paragraph- which uses n-grams to generate text based on the word pool it’s given. LLM calls were applied on both the existing paragraph and the Markov generated text to rephrase them, thereby producing more ‘original’ texts.’ After initially using paragraphs as my concepts, I also repeated the investigation with sentences to expand the universe and allow for better and closer nearest neighbours. Additionally, I tried feature extraction, wherein I extracted sentiment, topics and structure from the matched items and only provided those to the LLM for more ‘original’ final text. In the second phase of the project, I used the neural network BERT and relied on ability to interpret. Instead of creating a search index, I created vector embeddings within BERT, and allowed it to find nearest neighbours based on its own understandings. This was attempted with both single words and paragraphs as the concepts.

Setup

Here you can get the Paclet ‘LinguisticInterconcepts’ I created with all the functions developed in the project:

In[]:=

CloudGet@CloudObject[

https://www.wolframcloud.com/obj/dagarwal14/WSRP25/LinguisticInterconcepts.wl

]

These are the functions it consists of:

In[]:=

ResourceFunction["GridTableForm"][{#,Information[#,"Usage"]}&/@Names["LinguisticInterconcepts`*"],TableHeadings->{"Function name","Description"}]

Out[]=

#	Function name	Description
1	BlendVector	BlendVector[index, y] blends semantic vectors in the index.
2	GenerateMarkovText	GenerateMarkovText[text, maxSentences] generates new Markov-style text.
3	LLMRephraseByFeatures	LLM rephrase by sentence features.
4	MakeIndex	MakeIndex[text_String, opts] creates index from text.
5	MakeNGramTrie	MakeNGramTrie[text_String, ngram_Integer] generates new Markov-style text.
6	RephraseAndShow	RephraseAndShow[text] rephrases text using LLM and prints both versions.
7	SentencePOS	SentencePOS[s] gives a list of POS rules for the given sentence.

What is a concept?

The objective of exploring inter-concept space raises the question of what a concept even constitutes. The Oxford Languages dictionary describes the noun “concept” as ‘An abstract idea; a general notion.’ Interestingly enough, in the philosophical sense the dictionary describes it as ‘an idea or mental picture of a group or class of objects formed by combining all their aspects’. By this definition, a paragraph or a sentence is actually a better representation of a concept than a word. When discussing inter-concept space in the context of visuals, a concept is considered to be an image- hence image diffusion is considered an accurate exploration of the same. My argument is that an image in its simple sense is a collection of pixels- the same way a sentence or paragraph is a collection of words. A word only holds any meaning when defined through a sentence or paragraph- which means it can be said that all these forms of text are simply different ways of representing the same concept. Subsequently, I believe that is it acceptable that the linguistic analog of inter-concept space or ‘text diffusion’ can in fact be explored through paragraphs/sentences. Some might argue that pixels do not inherently hold any meaning of their own- while words do. This is true. The ultimate idea is that words too are concepts, but so are a collection of words.

Semantic Search Index

It is essential (and convenient and fast) to begin with creating an index, as it allows us to store the data in the desired form, in this case paragraphs and create an easily searchable dataset. More importantly, creating an index assigns vector embeddings to each data item, basically represents them as numbers.

Get the text:

In[]:=

text=Import["https://writings.stephenwolfram.com/2023/07/generative-ai-space-and-the-mental-imagery-of-alien-minds/","Plaintext"];

Split into paragraphs:

In[]:=

paragraphs=Select[StringTrim/@StringSplit[text,RegularExpression["\\n\\h*\\n"]],StringLength[#]>0&];Length[paragraphs]

Out[]=

103

Create a semantic search index with the paragraphs:

In[]:=

index=CreateSemanticSearchIndex[paragraphs];

Assign a variable to hold the vector embeddings:

In[]:=

embeddings=Normal[index["Embeddings"]];embeddings//Length

Out[]=

124

Vector blending

The merging of any two paragraphs can easily be done using weighted sum, because all the paragraphs have been represented as a set on numbers. If our dataset included all sentences ever written, nearest neighbour would provide us with an exact match to the new vector. Of course since that isn’t possible, we have reduced our ‘universe’ to Stephen’s article in this investigation.

Map the vector embeddings to the indices using a nearest neighbours function :

In[]:=

nfVecs=Nearest[embeddings->"Index"];

Randomly pick two vectors from the embeddings:

In[]:=

V3=RandomChoice[embeddings];V4=RandomChoice[embeddings];

Blend and create a new vector using weighted sum (an interpolation weight value of 0.5 evenly combines both vectors):

In[]:=

y=0.5;Vnew=V3+y*V4;Vnew//Length

Out[]=

384

Extract the index positions of both original vectors:

In[]:=

posV3=nfVecs[V3]//FirstposV4=nfVecs[V4]//First

Out[]=

We now use nearest neighbours function to find existing paragraphs in the text that correspond most closely to the new bended vector, excluding the original vectors themselves:

In[]:=

nns=Complement[nfVecs[Vnew,5],{posV3,posV4}]

Out[]=

{55,64,66}

In[]:=

index["Items"][[nns//First]]

Out[]=

Minds in Rulial Space We can think of what we’ve done so far as exploring some of the “natural history” of what’s out there in generative AI space—or as providing a small taste of at least one approximation to the kind of mental imagery one might encounter in alien minds. But how does this fit into a more general picture of alien minds and what they might be like? With the concept of the ruliad we finally have a principled way to talk about alien minds —at least at a theoretical level. And the key point is that any alien mind—or, for that matter, any mind—can be thought of as “observing” or sampling the ruliad from its own particular point of view, or in effect, its own position in rulial space.

View the original paragraphs themselves for comparison:

In[]:=

index["Items"][[posV3]]

Out[]=

It’s a nontrivial fact of physics that “pure motion” in physical space is possible ; in other words, that an “object” can be moved “without change” from one place in physical space to another. And now, in a sense, we’re asking about pure motion in rulial space : can we move something “without change” from one mind at one place in rulial space to another mind at another place? In physical space, things like particles—as well as things like black holes—are the fundamental elements that are imagined to move without change. So what’s now the analog in rulial space? It seems to be concepts—as often, for example, represented by words.

In[]:=

index["Items"][[posV4]]

Out[]=

In a sense the whole arc of the intellectual development of our civilization can be thought of as corresponding to an expansion in rulial space: with us progressively being able to think in new ways, and about new things. And as we expand in rulial space, we are in effect encompassing more of what we previously would have had to consider the domain of an alien mind. When we look at images produced by generative AI away from the specifics of human experience—say in interconcept space, or with modified rules of generation—we may at first be able to make little from them. Like inkblots or arrangements of stars we’ll often find ourselves wanting to say that what we see looks like this or that thing we know.

The BlendVector function in the package can simply be provided with the index variable and an interpolation weight value to return the original and new paragraphs:

BlendVector[index_,y_:0.5]

Markov Chain Text Generation

Prefix trees and Markov Chain work together to analyze a text, understand how frequently the words appear in relation to each other and then generate new text based on these findings. Since the nearest neighbour paragraphs aren’t exact matches to the blended vectors, Markov Chain will be used to generate more ‘original’ text. An existing paclet ‘TriesWithFrequencuies’ was edited to create functions for the LinguisticInterconcepts paclet.

Given the general larger text as a and a n-gram integer, the function MakeNGramTrie generates a prefix tree with predictions about the text flow:

In[]:=

tr=MakeNGramTrie[text,3];LeafCount[tr]

Out[]=

34311

The GenerateMarkovText function is then given the prefix tree, desired number of sentences and starting seed words to generate new text:

In[]:=

finaltext=GenerateMarkovText[tr,2,{"mental","imagery"}]

Out[]=

Mental imagery of alien” neural net like a human brain suffers a stroke in a very recognizable“ real-world-inspired” images— perhaps like a way to 3. Here are not anatomically correct) line from an array of possible images, essentially mathematical pattern.

Applying LLM Calls

Since neither the nearest neighbour paragraph or Markov Chain generated text are entirely original, LLMs can be used for rephrasing . This is particularly essential for the Markov Chain text as it is syntactically correct but essentially meaningless.

We create a function that uses the LLMFunction and add the specific command we want to implement:

In[]:=

rephraseLLM=LLMFunction["Rephrase the following paragraph to make it more clear and concise:\n\n```text\n`1`\n```"];

Then we apply the function on any text we have generated.

Nearest existing paragraph to blended vector:

In[]:=

new=index["Items"][[nns//First]];rephrased=rephraseLLM[new]

Out[]=

**Minds in Rulial Space** So far, we've explored the "natural history" of generative AI, offering a glimpse into the mental imagery that might exist in alien minds. But how does this relate to our understanding of these minds? The concept of the ruliad provides a theoretical framework for discussing alien minds. Essentially, any mind—alien or otherwise—can be viewed as "observing" or sampling the ruliad from its unique perspective within rulial space.

Markov chain text:

In[]:=

rephrasedmarkov=rephraseLLM[finaltext];

Both original and rephrased texts are then outputted with corresponding titles:

In[]:=

Grid[{{"Original Paragraph:",new},{"Rephrased Paragraph:",rephrased},{"Original Markov chain text:",finaltext},{"Rephrased Markov chain text:",rephrasedmarkov}},Dividers->All,Alignment->Left]

Out[]=

Original Paragraph:	Minds in Rulial Space We can think of what we’ve done so far as exploring some of the “natural history” of what’s out there in generative AI space—or as providing a small taste of at least one approximation to the kind of mental imagery one might encounter in alien minds. But how does this fit into a more general picture of alien minds and what they might be like? With the concept of the ruliad we finally have a principled way to talk about alien minds —at least at a theoretical level. And the key point is that any alien mind—or, for that matter, any mind—can be thought of as “observing” or sampling the ruliad from its own particular point of view, or in effect, its own position in rulial space.
Rephrased Paragraph:	Minds in Rulial Space So far, we've explored the "natural history" of generative AI, offering a glimpse into the mental imagery that might exist in alien minds. But how does this relate to our understanding of these minds? The concept of the ruliad provides a theoretical framework for discussing alien minds. Essentially, any mind—alien or otherwise—can be viewed as "observing" or sampling the ruliad from its unique perspective within rulial space.
Original Markov chain text:	Mental imagery of alien” neural net like a human brain suffers a stroke in a very recognizable“ real-world-inspired” images— perhaps like a way to 3. Here are not anatomically correct) line from an array of possible images, essentially mathematical pattern.
Rephrased Markov chain text:	Mental imagery of an alien neural network resembles a human brain experiencing a stroke, producing recognizable, real-world-inspired images. These images, while not anatomically accurate, are derived from a mathematical pattern of possible visuals.

Larger Item Pool

Repeating the investigation with the same article but distributing it into sentences essentially allows us to expand our universe. The availability of more items in the index and fewer ideas in one item (a sentence) can significantly improve the accuracy of the nearest neighbours matched with the blended vector. There is a much narrower scope to misinterpret the main context and miss significant words in a sentence when compared to a paragraph.

The ‘LinguisticInterconcepts’ paclet can be used to repeat the experiment with a larger item pool- ie. sentences.

In[]:=

text=Import["https://writings.stephenwolfram.com/2023/07/generative-ai-space-and-the-mental-imagery-of-alien-minds/","Plaintext"];StringLength[text]

Out[]=

55439

In[]:=

indexS=MakeIndex[text,"Granularity"->"Sentences"];

In[]:=

embeddingsS=Normal@indexS["Embeddings"];Dimensions[embeddingsS]

Out[]=

{448,384}

In[]:=

resultS=BlendVector[indexS,0.5]

In[]:=

V1=embeddingsS[[5]];V2=embeddingsS[[12]];resultS=BlendVector[indexS,V1,V2,0.4]

In[]:=

textS=resultS["MatchedItem"]

Out[]=

What are all these things?

In[]:=

rephrasedS=RephraseAndShow[textS]

Out[]=

What are all of these items?

In[]:=

textMRKV=GenerateMarkovText[tr,3,{"The","idea","is"}]

Out[]=

The idea is that none just look like they’re“ exteriors” of the ruliad from its neural net that’s for”, there are the fundamental physical laws it does not. But we get images. There are islands there lie huge expanses of what a neural net to us. But— or alien” by year 2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2004 2003 all © stephen wolfram language& communication life& times life science computational thinking differently” is to say that are useful in exploring further from a description, like the effect encompassing more alien: and roughly similar” attractor or how these seem to operate.

In[]:=

rephrasedMRKV=RephraseAndShow[textMRKV]

Out[]=

The concept is that what we perceive as the "exteriors" of the ruliad, derived from its neural network, does not encompass the fundamental physical laws. However, we do receive images that represent vast areas of what a neural network can show us. By the years 2025, 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2004, and 2003, we can see how these images can be both familiar and alien. The idea is that these images are useful for further exploration beyond mere descriptions, revealing effects that are more complex and similar to attractors, and shedding light on how they seem to function.

Feature Extraction

The ultimate goal of text diffusion is to produce an original text that is a blend of the two input texts. While vector embeddings made it easy to blend the texts, the difficult part has been producing the new text- since our ‘universe’ is too small to find an entirely accurate nearest neighbour. One method to produce text that is both more original and closely aligned to the blended vector is to provide a LLM with the text features, instead of the text itself. This allows for the idea and meaning of the text to be captured without simply rephrasing it.

We extract the topic and sentiment of the Markov chain generated text:

In[]:=

topic=Classify["FacebookTopic"][textMRKV]sen=Classify["Sentiment"][textMRKV]

Out[]=

Politics

Out[]=

Neutral

To extract the parts of speech in the text we use the SentencePOS function from the paclet:

In[]:=

textstructure=SentencePOS[textMRKV]

Out[]=

{TheDeterminer,ideaNoun,isVerb,thatPreposition,noneNoun,justAdverb,lookVerb,likePreposition,theyPronoun,’reVerb,“Punctuation,exteriorsNoun,”Punctuation,ofPreposition,theDeterminer,ruliadNoun,fromPreposition,itsPronoun,neuralAdjective,netNoun,thatWhDeterminer,’sVerb,forPreposition,”Punctuation,,Punctuation,therePronoun,areVerb,theDeterminer,fundamentalAdjective,physicalAdjective,lawsNoun,itPronoun,doesVerb,notAdverb,.Punctuation,ButConjunction,wePronoun,getVerb,imagesNoun,.Punctuation,TherePronoun,areVerb,islandsNoun,therePronoun,lieVerb,hugeAdjective,expansesNoun,ofPreposition,whatWhPronoun,aDeterminer,neuralAdjective,netNoun,toPreposition,usPronoun,.Punctuation,ButConjunction,—Punctuation,orConjunction,alienAdjective,”Punctuation,byPreposition,yearNoun,2025 2024 2023 2022Numeral,2021 2020 2019 2018Numeral,2017 2016 2015 2014Numeral,2013 2012 2011 2010Numeral,2009 2008 2007 2006Numeral,2004ProperNoun,2003ProperNoun,allDeterminer,©Adjective,stephenNoun,wolframAdjective,languageNoun,&Conjunction,communicationNoun,lifeNoun,&Conjunction,timesNoun,lifeNoun,scienceNoun,computationalVerb,thinkingVerb,differentlyAdverb,”Punctuation,isVerb,toPreposition,sayVerb,thatWhDeterminer,areVerb,usefulAdjective,inPreposition,exploringVerb,furtherAdverb,fromPreposition,aDeterminer,descriptionNoun,,Punctuation,likePreposition,theDeterminer,effectNoun,encompassingVerb,moreAdverb,alienAdjective,:Punctuation,andConjunction,roughlyAdverb,similarAdjective,”Punctuation,attractorNoun,orConjunction,howWhAdverb,theseDeterminer,seemVerb,toPreposition,operateVerb,.Punctuation}

We provide the LLMRephraseByFeatures function in the paclet with the extracted features (in the following order) to generate the final text:

In[]:=

LLMRephraseByFeatures[textstructure,topic,sen]

Out[]=

The idea is that there are fundamental physical laws, and we get images like the effect encompassing roughly similar attractors to how these seem to operate in politics.

Remark: Feature extraction can also be performed on the matched item itself, instead of generating Markov chain text first.

Using BERT as an LLM

An important realization at this point in the project was that LLMs such as BERT which we were using for the semantic search index already have knowledge and understanding of their own. Hence it’s unnecessary to create a search index and limit the LLM to the text it can search and use. Instead, we can simply provide the neural network with a narrower universe and rely on its ability to interpret it.

Sentence embedding in BERT

Obtain a custom sentence embedding network that converts text strings into sentence vectors by appending a pooling layer to the pretrained BERT model:

In[]:=

sentenceEmbeddingBERT=NetAppend[NetModel["BERT Trained on BookCorpus and Wikipedia Data"],"pooling"SequenceLastLayer[]]

Out[]=

NetChain

Inputport:	string
Outputport:	vector(size: 768)

Data not saved. Save now



Each sentence vector has 768 dimensions :

In[]:=

sentenceEmbeddingBERT["What is the population of Brazil?"]//Length

Out[]=

768

Single words as concepts

Here, we are using animal names as the dataset, as they are easy for the network to understand interpret.

Obtain the data of animal names:

In[]:=

animalNames=ToLowerCase@

;animalNames//Length

Out[]=

645

Create vector embeddings for the data:

In[]:=

AbsoluteTiming[embeddings=sentenceEmbeddingBERT/@animalNames;]

Out[]=

{22.7552,Null}

In[]:=

Dimensions[embeddings]

Out[]=

{645,768}

Map the vector embeddings to the indices using a nearest neighbours function :

In[]:=

nfVecs=Nearest[embeddings->{"Distance","Index"}];

Test the function by finding the nearest neighbours for a single animal name:

In[]:=

res=nfVecs[Normal@sentenceEmbeddingBERT["dog"],3]

Out[]=

{{0.,162},{2.20781,267},{2.43475,618}}

In[]:=

ResourceFunction["GridTableForm"][Transpose[{res[[All,1]],animalNames[[res[[All,2]]]]}],TableHeadings->{"Distance","Text"}]

Out[]=

#	Distance	Text
1	0.	dog
2	2.20781	horse
3	2.43475	wild dog

Merging words

Use weighted sum to create a blended vector of two animal names:

In[]:=

th=0.5;{w1,w2}={"cat","fish"};v1=sentenceEmbeddingBERT[w1];v2=sentenceEmbeddingBERT[w2];v=v1*th+(1-th)*v2;res=nfVecs[v,4];(*res=Select[res,!MemberQ[{w1,w2},animalNames[[#[[2]]]]]&]*)

In[]:=

ResourceFunction["GridTableForm"][Transpose[{res[[All,1]],animalNames[[res[[All,2]]]]}],TableHeadings->{"Distance","Text"}]

Out[]=

#	Distance	Text
1	1.94599	cat
2	1.94599	fish
3	2.25782	angelfish
4	2.27111	snake

Remark: We do not filter out the blended concept-words in order to be able to visually verify the correctness of the nearest neighbor finding -- the blended concept-words should be in the result by Nearest. (See the commented out code for how such filtering can be done.)

Including the definitions

In the beginning I discussed how paragraphs and words are essentially just different ways of representing the same concept. Adhering to that idea, the inter-concept space of animals can also be explored using definitions of the animals.

Import the animal dictionary data:

In[]:=

animalDict=CloudGet@

https://www.wolframcloud.com/obj/dagarwal14/WSRP25/AnimalDictionary.json

;animalDict=ToString[animalDict,CharacterEncoding->"UTF-8"];aAnimals=Association/@ImportString[animalDict,"JSON"][[1,2]];aAnimals=Association@Map[#name->#summary&,aAnimals];

This is what the data looks like:

In[]:=

SeedRandom[11];Short/@RandomSample[aAnimals,3]

Out[]=

Finnish SpitzA Finnish Spitz (Finnish language: suomenpystykorv… has been the national dog of Finland since 1979.,EchidnaEchidnas (), sometimes known as spiny anteaters, b… was aquatic, but echidnas adapted to life on land.,MoleMoles are small mammals adapted to a subterranean …lant roots, and providing prey for other wildlife.

In[]:=

aAnimals=StringTrim/@aAnimals;aAnimals=StringReplace[#,("\n"..)->"\n"]&/@aAnimals;

The data is now searchable by animal name :

In[]:=

aAnimals["Cat"]

Out[]=

The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is often referred to as the domestic cat to distinguish it from the wild members of the family. A cat can either be a house cat, a farm cat or a feral cat; the latter ranges freely and avoids human contact. Domestic cats are valued by humans for companionship and their ability to hunt rodents. About 60 cat breeds are recognized by various cat registries.The cat is similar in anatomy to the other felid species: it has a strong flexible body, quick reflexes, sharp teeth and retractable claws adapted to killing small prey. Its night vision and sense of smell are well developed. Cat communication includes vocalizations like meowing, purring, trilling, hissing, growling and grunting as well as cat-specific body language. It is a solitary hunter but a social species. It can hear sounds too faint or too high in frequency for human ears, such as those made by mice and other small mammals. It is a predator that is most active at dawn and dusk. It secretes and perceives pheromones.Female domestic cats can have kittens from spring to late autumn, with litter sizes ranging from two to five kittens. Domestic cats are bred and shown at events as registered pedigreed cats, a hobby known as cat fancy. Failure to control breeding of pet cats by spaying and neutering, as well as abandonment of pets, resulted in large numbers of feral cats worldwide, contributing to the extinction of entire bird species and evoking population control.Cats were first domesticated in the Near East around 7500 BC. It was long thought that cat domestication was initiated in Ancient Egypt, as since around 3100 BC veneration was given to cats in ancient Egypt.As of 2017, the domestic cat was the second-most popular pet in the United States by number of pets owned, after freshwater fish, with 95 million cats owned. In the United Kingdom, around 7.3 million cats lived in more than 4.8 million households as of 2019.

We can not use the Wolfram Function Repository BERT function “SentenceBERTEmbedding” for sentence embedding as it uses HTTPS call to HaggingFace which is a slow process and gives HTTP Error 429:

(*ResourceFunction["SentenceBERTEmbedding"]["dog"]*)

Create vector embeddings and map them to the indices using nearest neighbours function :

In[]:=

AbsoluteTiming[embeddingsNames=sentenceEmbeddingBERT/@Values[aAnimals];]

Out[]=

{253.486,Null}

In[]:=

Dimensions[embeddingsNames]

Out[]=

{599,768}

In[]:=

nfVecs=Nearest[embeddings->{"Distance","Index"}];

Blend the vectors using weighted sum :

In[]:=

th=0.4;{w1,w2}={"Cat","Fish"};v1=sentenceEmbeddingBERT[aAnimals[w1]];v2=sentenceEmbeddingBERT[aAnimals[w2]];v=v1*th+(1-th)*v2;res=nfVecs[v,4];res=Select[res,!MemberQ[{w1,w2},Keys[aAnimals][[#[[2]]]]]&]

Out[]=

{{18.5622,574},{18.6729,587},{18.7387,448},{18.7785,240}}

Obtain the nearest neighbours:

In[]:=

ResourceFunction["GridTableForm"][Transpose[{res[[All,1]],Values[aAnimals][[res[[All,2]]]]}],TableHeadings->{"Distance","Text"}]

Out[]=

#	Distance	Text
1	18.5622	The western lowland gorilla (Gorilla gorilla gorilla) is one of two subspecies of the western gorilla (Gorilla gorilla) that lives in montane, primary and secondary forest and lowland swampland in central Africa in Angola, Cameroon, Central African Republic, Republic of the Congo, Democratic Republic of the Congo, Equatorial Guinea and Gabon. It is the nominate subspecies of the western gorilla, and the smallest of the four gorilla subspecies.The western lowland gorilla is the only subspecies kept in zoos with the exception of Amahoro, a female eastern lowland gorilla at Antwerp Zoo, and a few mountain gorillas kept captive in the Democratic Republic of the Congo.
2	18.6729	The woolly mammoth (Mammuthus primigenius) is an extinct species of mammoth that lived during the Pleistocene until its extinction in the early Holocene epoch. It was one of the last in a line of mammoth species, beginning with Mammuthus subplanifrons in the early Pliocene. The woolly mammoth began to diverge from the steppe mammoth about 800,000 years ago in East Asia. Its closest extant relative is the Asian elephant. The appearance and behaviour of this species are among the best studied of any prehistoric animal because of the discovery of frozen carcasses in Siberia and Alaska, as well as skeletons, teeth, stomach contents, dung, and depiction from life in prehistoric cave paintings. Mammoth remains had long been known in Asia before they became known to Europeans in the 17th century. The origin of these remains was long a matter of debate, and often explained as being remains of legendary creatures. The mammoth was identified as an extinct species of elephant by Georges Cuvier in 1796.The woolly mammoth was roughly the same size as modern African elephants. Males reached shoulder heights between 2.7 and 3.4 m (8.9 and 11.2 ft) and weighed up to 6 metric tons (6.6 short tons). Females reached 2.6–2.9 m (8.5–9.5 ft) in shoulder heights and weighed up to 4 metric tons (4.4 short tons). A newborn calf weighed about 90 kg (200 lb). The woolly mammoth was well adapted to the cold environment during the last ice age. It was covered in fur, with an outer covering of long guard hairs and a shorter undercoat. The colour of the coat varied from dark to light. The ears and tail were short to minimise frostbite and heat loss. It had long, curved tusks and four molars, which were replaced six times during the lifetime of an individual. Its behaviour was similar to that of modern elephants, and it used its tusks and trunk for manipulating objects, fighting, and foraging. The diet of the woolly mammoth was mainly grasses and sedges. Individuals could probably reach the age of 60. Its habitat was the mammoth steppe, which stretched across northern Eurasia and North America.The woolly mammoth coexisted with early humans, who used its bones and tusks for making art, tools, and dwellings, and the species was also hunted for food. It disappeared from its mainland range at the end of the Pleistocene 10,000 years ago. Isolated populations survived on St. Paul Island until 5,600 years ago and on Wrangel Island until 4,000 years ago. After its extinction, humans continued using its ivory as a raw material, a tradition that continues today. With a genome project for the mammoth completed in 2015, it has been proposed the species could be recreated through various means, but none of these is yet feasible.
3	18.7387	The resplendent quetzal ( ) (Pharomachrus mocinno) is a bird in the trogon family. It is found from Chiapas, Mexico to western Panama (unlike the other quetzals of the genus Pharomachrus, which are found in South America and eastern Panama). It is well known for its colorful plumage. There are two subspecies, P. m. mocinno and P. m. costaricensis.The resplendent quetzal plays an important role in various types of Mesoamerican mythology. It is the national bird of Guatemala, and its image is found on the country's flag and coat of arms. It also lends its name to the country's currency, the Guatemalan quetzal (abbreviation GTQ).
4	18.7785	This is a list of animals that live in the Galápagos Islands.

Conclusion

This project set out to approximate the “space between” two texts by blending their vector embeddings and retrieving meaningful intermediates using nearest neighbour searches. By applying Markov Chains and LLM-based rephrasing, I generated variations that reflected shared semantic features of the original inputs. Moreover, the use of different scales (paragraphs versus sentences) and different embedding approaches (semantic indices versus BERT-based representations) underscored the importance of granularity and model interpretability in such tasks. This suggests that exploring inter-concept space is not only about finding mathematical averages of meaning, but also about understanding the linguistic structures and contexts that shape semantic transitions. Although I did not implement true text diffusion—which would involve training a model to recover original text from noisy variants—this work demonstrates that combining embeddings and guided generation can produce coherent, interpretable transitions between concepts.

Limitations and Future Plans

The major limitations in the project were that I was not able to program the actual process of text diffusion, and hence unable to find text that corresponds most closely to the blended vectors.

One of the most common models for text diffusion is corrupting and adding noise to text and then training a model to output the original text when given a noisy version.
I want to attempt this using prefix trees and Markov Chain and creating a database with innumerous noisy variations of all sentences in a large text. Then I will train a neural network with the data to be able to return the original sentence when given any variation. Due to the high levels of computing power and the extended amount of time that this extension of the project would take, it’s something I want to attempt in the future.

Here is a description of an alternative much simpler algorithm for iterative refinement of word sequences that is a type of text diffusion.

Derive the merged vector

of two concepts

Randomly select words from the descriptions of concepts-to-be-merged

Denote this sequence of words with

Keep track of the refinement index

i=0

Refine the sequence

with one or several Markov chain refinements

For some (

) subsequences of

n-1

subsequent words find a possible

next word using a

-gram prefix tree

Find the embedding vector

Find the distance

and

is larger than

i-1

(for

i>0

)

Discard the iteration refinement:

i-1

Go to step 3

is smaller than a predetermined threshold give

as the result

Otherwise go to step 3

Other text diffusion procedures using specially trained neural network for the refinement step. (Step 3 above.)

Acknowledgements

I would like to sincerely thank my mentor Anton Antonov for not only guiding me with every possible aspect and step of this project, but being just as deeply invested as me throughout. I also want to thank the program director, Megan for greatly assisting me in both personal and project-related contexts. Lastly, I’d like to thank WSRP for providing me this opportunity to explore my interest in computational thinking and greatly deepen my understanding.

References

CITE THIS NOTEBOOK

What is the linguistic analog of Inter-concept space?
by Delisha Agarwal
Wolfram Community, STAFF PICKS, July 10, 2025
https://community.wolfram.com/groups/-/m/t/3499159

Introduction

Setup

What is a concept?

Semantic Search Index

Vector blending

Markov Chain Text Generation

Applying LLM Calls

Larger Item Pool

Feature Extraction

Using BERT as an LLM

Sentence embedding in BERT

Single words as concepts

Merging words

Including the definitions

Conclusion

Limitations and Future Plans

Acknowledgements

References

Articles

Notebooks

Functions, paclets, other tools

Data

CITE THIS NOTEBOOK