Function Resource

Function Repository Resource:

TextEmbeddingPlot

Source Notebook

Create a plot of the dimension-reduced text embeddings of a collection of strings

Contributed by: John McNally

ResourceFunction["TextEmbeddingPlot"][{string1,string2,},dim]

computes embedding vectors for each of the stringi, reduces them to dimension dim, and plots the result.

ResourceFunction["TextEmbeddingPlot"][{list1,list2,},dim]

computes embedding vectors for each string in each of the listi, reduces them to dimension dim, and plots the result.

The argument dim can be either 2 or 3.
By default, embedding vectors are computed using ResourceFunction["SentenceBERTEmbedding"].
By default, embedding vectors are computed with "Real32" working precision.
A Graphics object is returned.
The option "EmbeddingFunction" can be an arbitrary function which takes a list of strings as input and outputs a list of corresponding vectors.
The option "ReductionMethod" can be set to any of the Method options accepted by DimensionReduction. This option can also be set to a specific DimensionReducerFunction with appropriate input and output dimensions. Alternatively, this option can be set to an arbitrary function which accepts vectors of the appropriate input and output dimension.
The option "EmbeddingFunction" can be an arbitrary function which takes a list of strings as input and outputs a list of corresponding vectors.
Accepts the same options as ListPlot and ListPointPlot3D.
The default LabelingFunction is set to Tooltip, with the original strings used as labels.

Examples

Basic Examples (2) 

Plot the embeddings of a list of strings:

In[1]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][Alphabet[], 2]
Out[1]=

Compare sentence embeddings from different texts:

In[2]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][ TextSentences /@ {ResourceData["Gettysburg Address"], ResourceData["Friends, Romans, Countrymen"]}, 3]
Out[2]=

Prepare sentences from three different text sources:

In[3]:=
sources = {"Alice in Wonderland", "Declaration of Independence", "Gettysburg Address"};
data = Map[Take[TextSentences@ResourceData[#], UpTo[50]] &, sources];

Create a labeled plot:

In[4]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 3, PlotLegends -> sources]
Out[4]=

Present the same data with a different dimension reduction method:

In[5]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 3, PlotLegends -> sources, "ReductionMethod" -> "UMAP"]
Out[5]=

Define a custom embedding function using an external service, which also prints information on model and token use:

In[6]:=
customEmbedder[stringlist_List] /; AllTrue[stringlist, StringQ] := Module[{response}, response = ServiceExecute["OpenAI", "Embedding", {"Input" -> stringlist}]; Print@KeyDrop[response, "Content"]; response["Content"] ]

Use the custom embedding function on the previously defined data:

In[7]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data[[All, 1 ;; 10]], 2, "EmbeddingFunction" -> customEmbedder]
Out[7]=

The "EmbeddingFunction" should accept a list of strings as input. Use a naive embedding function that computes letter counts:

In[8]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "EmbeddingFunction" -> Map[Lookup[LetterCounts[#], Alphabet[], 0] &]]
Out[8]=

A function specified for "EmbeddingFunction" should output a corresponding list of vectors or NumericArray objects:

In[9]:=
randomEmbedder[list_] := NumericArray[RandomReal[{-1, 1}, {Length@list, 384}], "Real32"]
In[10]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "EmbeddingFunction" -> randomEmbedder]
Out[10]=

Method options of DimensionReduction are supported:

In[11]:=
GraphicsRow@Table[ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "ReductionMethod" -> m], {m, {Automatic, "TSNE", "UMAP"}}]
Out[11]=

Use a specific dimension reduction function:

In[12]:=
dr = DimensionReduction[RandomReal[{-100, 100}, {1000, 384}], 2]
Out[12]=
In[13]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "ReductionMethod" -> dr]
Out[13]=

Define an arbitrary function to perform the dimension reduction on embedding vectors:

In[14]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "ReductionMethod" -> (Take[#, -2] &)]
Out[14]=

Use default behavior of ListPlot:

In[15]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, LabelingFunction -> Automatic]
Out[15]=

Use default behavior of ListPointPlot3D:

In[16]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 3, LabelingFunction -> Automatic]
Out[16]=

Define a custom LabelingFunction:

In[17]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data[[All, 1 ;; 10]], 2, LabelingFunction -> (StringTake[#3[[3, 1]], UpTo[20]] <> " ..." &)]
Out[17]=

A "ReductionMethod" that is not one of the supported options for DimensionReduction must accept vectors of the appropriate input and output dimensions:

In[18]:=
wrongOutputDim = DimensionReduction[RandomReal[{-100, 100}, {1000, 384}], 3]
Out[18]=
In[19]:=
ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "ReductionMethod" -> wrongOutputDim]
Out[19]=
In[20]:=
wrongInputDim = DimensionReduction[RandomReal[{-100, 100}, {1000, 15}], 2]
Out[20]=
In[21]:=
Enclose@ConfirmQuiet@ResourceFunction[
CloudObject[ "https://www.wolframcloud.com/obj/jmcnally0/DeployedResources/Function/TextEmbeddingPlot"]][data, 2, "ReductionMethod" -> wrongInputDim]
Out[21]=