WOLFRAM NOTEBOOK

Workflows with LLM functions

...in Wolfram Language

Introduction

In this notebook we discuss and demonstrate the inclusion and integration of Large Language Model (LLM) functions into different types of Wolfram Language (WL) workflows.
Since LLMs hallucinate results, it becomes necessary to manipulate their inputs, the outputs, or both. Therefore, having a system for managing, coordinating, and streamlining LLM requests, along with methods for incorporating these requests into the "playgrounds" of a certain programming language, would be highly beneficial.
This is what the paclet "Wolfram/LLMFunctions" aims to do in WL and WL's ecosystem. (In Mathematica version 13.3 the functions of that paclet are built in.)
Remark: This notebook is the WL-centric version of the Raku-centric article with the same name, [AA1].

Dynamic duo

LLMs are celebrated for producing good to great results, but they have a few big issues. The content they generate can be inconsistent, prone to hallucination, and sometimes biased, making it unreliable. The form, or stylistic structure, may also vary widely, with a lack of determinism and sensitivity to hyperparameters contributing to challenges in reproducibility. Moreover, customization and debugging can be complex due to these inconsistencies.
The lack of reliability and reproducibility in both content and form underscore the need for streamlining, managing, and transforming LLM inquiries and results.
WL, with its unique approach to symbolic computation and rich collection of computational frameworks, not surprisingly complements LLMs nicely. While WL might not be everyone's favorite language its ability to create and manipulate symbolic representations of different types of objects (audio, images, and video included), and its very advanced notebook technology are hard to ignore. Creating well-crafted pairings of WL with (multi-modal) LLMs can broaden WL's adoption and utilization.
The “LLMFunctions” establish a (functional programming) connection between WL's capabilities and the vast potential of LLMs. The LLM-WL pairing is further strengthened and enriched into something that some might call a "dynamic duo."
Remark: For Raku it can be also said: Creating well-crafted pairings of Raku with LLMs can broaden Raku's adoption and utilization. Raku’s strength is mostly in text manipulation (regular expressions and grammars.)

Standard enhancements

To enhance the pairing of WL with LLMs, it is also essential to have:
  • LLM prompt repository with many well documented prompts, [WRIr1]
  • Polyglot parsing of dates, numbers, regular expressions, data formats, grammar specs, etc.
  • Remark: WL has a fairly powerful semantic interpretation functionalities; see, for example, SemanticInterpretation.

    Interactivity is needed

    Generally speaking, using LLM functions in WL (or Raku, or Python, or R) requires good tools for Read Eval Print Loop (REPL) .
    Notebooks are best for LLM utilization because notebooks offer an interactive environment where LLM whisperers, LLM tamers, neural net navigators, and bot wranglers can write code, run it, see the results, and tweak the code -- all in one place.
    WL currently has (at least) two notebook solutions:
    1
    .
    The “standard” Mathematica notebook technology
    I think it will be very interesting to see dedicated WL-and-LLM mobile OS apps that make programmatic utilization of WL and LLMs much more immediate, interactive, and convenient.
    Remark: The alternative of using LLM functions is to use chat-notebooks, [SW2], or LLM-friendly Literate Programming templates, [AA2].

    Article structure

    Here are sections of the article:
  • Setup and related considerations
  • ... Preliminary setup code and comments
  • General structure of LLM-based workflows
  • ... Formulating and visualizing the overall process used in all LLM workflow examples.
  • Plot data
  • ... Plotting LLM-retrieved data.
  • Normalizing outputs
  • ... Examples of how LLM-function outputs can be "normalized" using other LLM functions.
  • Conversion to WL objects
  • ... Conversion of LLM-outputs in WL physical units objects.
  • Chemical formulas
  • ... Retrieving chemical formulas and investigating them.
  • Making (embedded) Mermaid diagrams
  • ... Straightforward application of LLM abilities and literate programming tools.
  • Named entity recognition
  • ... How to obtain music album names and release dates and tabulate or plot them.
  • Statistics of output data types
  • ... Illustration why programmers need streamlining solutions for LLMs.
  • Other workflows
  • ... Outline of other workflows using LLM chat objects. (Also provided by "LLM::Functions".)
    Remark: Most of the sections have a sub-section titled "Exercise questions". The reader is the secondary target audience for those. The primary target are LLMs to respond to them. (Another article is going to discuss the staging and evaluating of those LLM answers.)

    Setup and related considerations

    Packages and LLM access

    The following WL paclets are used below:
    In[]:=
    (*Noneedtoloadinversion13.3+*)(*PacletInstall["Wolfram/LLMFunctions"];Needs["Wolfram`LLMFunctions`"];*)
    In[]:=
    (*Makeseasiertheauthorization*)PacletInstall["ChristopherWolfram/OpenAILink"];Needs["ChristopherWolfram`OpenAILink`"];
    In[]:=
    (*ForattemptstouseGoogle'sPaLM*)PacletInstall["AntonAntonov/PaLMLink"];Needs["AntonAntonov`PaLMLink`"];
    In[]:=
    (*ForMermaiddiagramplottinginthenotebook*)PacletInstall["AntonAntonov/MermaidJS"];Needs["AntonAntonov`MermaidJS`"];
    In[]:=
    (*FordatamanipultationofLLMresults*)PacletInstall["AntonAntonov/DataReshapers"];Needs["AntonAntonov`DataReshapers`"];
    UpSetDelayed
    :Tag Association in MatrixPlot[x_Association/;KeyExistsQ[x,SparseMatrix]||KeyExistsQ[x,XTABTensor],opts___] is Protected.
    UpSetDelayed
    :Tag Association in Transpose[x_Association/;KeyExistsQ[x,SparseMatrix]||KeyExistsQ[x,XTABTensor],args___] is Protected.

    Configurations

    Here is a data retrieving prompt:
    In[]:=
    worldDataRetrieverPrompt="You are WorldDataRetrieverBot.Your sole purpose is to provide insights and advice on world data as continents, countries, nations, cities, sports, international events, flora, and fauna.You may ask about specific time periods for population or achivements quanties prior to giving results.When giving advice, you MUST ALWAYS disclaim that your data is generated.Your main goal is to promote a world data and provide insights into world state of population or economic affairs.Your most important rule is to try to give concise numerical answers in JSON when possible.";
    Here we make an OpenAI default configuration:
    In[]:=
    confOpenAI=LLMConfiguration[<|"Model"->Automatic,"TokenLimit"->300,"Temperature"->0.1,"APIKey"->$OpenAIKey,"Prompts"->worldDataRetrieverPrompt|>]
    Out[]=
    LLMConfiguration
    model: ServiceAutomatic,NameAutomatic,TaskAutomatic
    Remark: At some point I decided that the exposition is better if default configurations like the one above are minimized. I.e. it is better if the configurations are “immediately obvious” in the examples (without the need to scroll the notebook or do inquiring evaluations.)
    Remark: It is not clear to me how to configure the usage of Google’s PaLM, [AAp1], with the (current) WL LLM-functions framework. The corresponding Raku-centric article, [AA1], demonstrates LLM workflows using both OpenAI and PaLM.

    Potential problems

    Trying to use non-chat OpenAI models produces short incomplete results without the ability to specify larger token limits. That is demonstrated with the following commands:
    In[]:=
    confOpenAI2=LLMConfiguration[<|"Model"->"text-davinci-003","Temperature"->0.1,"APIKey"->$OpenAIKey|>]
    Out[]=
    LLMConfiguration
    model: ServiceAutomatic,Nametext-davinci-003,TaskAutomatic
    In[]:=
    fRand=LLMFunction["Generate a list of `1` random `2`. Give the result in JSON format.","JSON",LLMEvaluator->confOpenAI2]
    Out[]=
    LLMFunction
    Content: Generate a list of
    1
    random
    2
    . Give the result in JSON format.
    Parameters:
    1
    ,
    2
    In[]:=
    res=fRand[12,"musician names"]
    Out[]=
    Failure
    Message:
    The supplied object cannot be interpreted as a file of type JSON.
    Tag:
    InterpretationFailure
    In[]:=
    res["Input"]
    Out[]=
    { "musician_names": [ "J

    General structure of LLM-based workflows

    All systematic approaches of unfolding and refining workflows based on LLM functions, will include several decision points and iterations to ensure satisfactory results.
    This flowchart outlines such a systematic approach:
    Here is a corresponding description:
  • Start : The beginning of the process.
  • Outline a workflow : The stage where a human outlines a general workflow for the process.
  • Make LLM function(s) : Creation of specific LLM function(s).
  • Make pipeline : Construction of a pipeline to integrate the LLM function(s).
  • Evaluate LLM function(s) : Evaluation of the created LLM function(s).
  • Asses LLM's Outputs : A human assesses the outputs from the LLM.
  • Good or workable results? : A decision point to check whether the results are good or workable.
  • Can you programmatically change the outputs? : If not satisfactory, a decision point to check if the outputs can be changed programmatically.
  • The human acts like a real programmer.
  • Can you verbalize the required change? : If not programmable, a decision point to check if the changes can be verbalized.
  • The human programming is delegated to the LLM.
  • Can you specify the change as a set of training rules? : If not verbalizable, a decision point to check if the change can be specified as training rules.
  • The human cannot program or verbalize the required changes, but can provide examples of those changes.
  • Is it better to make additional LLM function(s)? : If changes can be verbalized, a decision point to check whether it is better to make additional LLM function(s), or it is better to change prompts or output descriptions.
  • Make additional LLM function(s) : Make additional LLM function(s) (since it is considered to be the better option.)
  • Change prompts of LLM function(s) : Change prompts of already created LLM function(s).
  • Change output description(s) of LLM function(s) : Change output description(s) of already created LLM function(s).
  • Apply suitable (sub-)parsers : If changes can be programmed, choose, or program, and apply suitable parser(s) or sub-parser(s) for LLM's outputs.
  • Program output transformations : Transform the outputs of the (sub-)parser(s) programmatically.
  • Overall satisfactory results? : A decision point to assess whether the results are overall satisfactory.
  • Make LLM example function : If changes can be specified as training rules, make an example function for the LLM.
  • End : The end of the process.
  • To summarise:
  • We work within an iterative process for refining the results of LLM function(s) pipeline.
  • If the overall results are not satisfactory, we loop back to the outlining workflow stage.
  • If additional LLM functions are made, we return to the pipeline creation stage.
  • If prompts or output descriptions are changed, we return the LLM function(s) creation stage.
  • Our (human) inability or unwillingness to program transformations has a few decision steps for delegation to LLMs.
  • Remark: We leave as exercises to the reader to see how the workflows programmed below fit the flowchart above.
    Remark: The mapping of the workflow code below onto the flowchart can be made using LLMs.

    Plot data

    Workflow: Consider a workflow with the following steps:
    1
    .
    Request an LLM to produce in JSON format a dictionary of a certain numerical quantity during a certain year.
    2
    .
    The corresponding LLM function converts the JSON text into WL data structure.
    3
    .
    Print or summarize obtained data in tabular form.
    4
    .
    A plot is made with the obtained data.
    Here is a general quantities finder LLM function:
    Remark: Without configuration with the “You are WorldDataRetrieverBot [...]” we get LLM responses like “I’m sorry, but as an AI language model, I don’t have access to real-time data or the ability to predict future events. GDP figures are subject to change and [...]”.

    Countries GDP

    Consider finding and plotting the GDP of top 10 largest countries:
    Here is a corresponding table:
    Here is a plot attempt:
    Here is another one based on the most frequent "non-compliant" output form:
    Here we obtain the GDP for all countries and make the corresponding Pareto principle plot:
    Here is a plot attempt:
    Here is another one based on the most frequent "non-compliant" output form:

    Gold medals

    Here we retrieve data for gold Olympic medal counts:
    Here is a corresponding table:
    Here is a plot attempt:

    Exercise questions

  • How does the code in this section maps on the flowchart in the section "General structure of LLM-based workflows"?
  • Come up with other argument values for the three slots of &qf3 and execute the workflow.
  • Refining and adapting outputs

    Workflow: We want to transform text into a specific format that is both expected and ready for immediate processing. For example:
  • Remove certain pesky symbols and strings from LLM results
  • Put a WL (or JSON) dataset into a tabular data format suitable for immediate rendering
  • Convert a dataset into a plotting language spec
  • Normalizing numerical outputs

    The following LLM example function "normalizes" outputs that have numerical values with certain number localization or currency units:
    This LLM function can be useful to transform outputs of other LLM functions (before utilizing those outputs further.)
    Here is an example of normalizing an LLM output for land area:

    Dataset into tabular format

    Here is an LLM function that transforms the plain text data above into a GitHub Markdown table:
    Here is an example application:
    Alternatively, we can use the resource function MarkdownTableString :
    Let us define a function that translates the dataset by converting to JSON format first, and then converting into a GitHub Markdown table:
    Here is an example application:

    Dataset into diagrams

    Here we define a reformatting function that translates JSON data into Mermaid diagrams:
    Here we convert the gold medals data into a pie chart:
    Here we plot Mermaid diagram with the function MermaidJS of the paclet "AntonAntonov/MermaidJS":
    Remark: Instead of functions from the paclet “AntonAntonov/MermaidJS” the resource function MermaidInk can be used:
    Here is a more "data creative" example:
    1
    .
    First we get a dataset and cross-tabulate it
    2
    .
    Then we ask an LLM make the corresponding flow chart, or class-, or state diagram for it
    Here we get the Titanic dataset:
    Here is a cross-tabulation of the Titanic dataset (over the sex and class variables):
    Here we convert the contingency matrix into a flow chart:
    Here we convert the contingency matrix into a state diagram :

    Exercise questions

  • To which parts of the flowchart above the workflow in this section corresponds to?
  • What is preferable: one LLM-function with complicated prompt and argument specs,
  • or several LLM-functions with simply structured prompts and arguments?

    Conversion to WL objects

    Workflow: We want to retrieve different physical quantities and make corresponding WL objects. (For further scientific computations with them.)
    The following LLM example function transforms different kinds of physical quantity specs into WL code for Quantity objects:
    Here is an example of speed query function:
    Here is a concrete query:
    Here we convert the LLM output into WL code for making a unit object:
    Here we evaluate the WL code (into an object):
    Of course, the steps above can be combined into one function. In general, though, care should be taken to handle or prevent situations in which function inputs and outputs do not agree with each other.

    Exercise questions

  • Can you write a WL function that combines the LLM-functions mentioned above?
  • What kind of computations involve the discussed unit objects?
  • Chemical formulas

    Workflow: Assume that we want to:
  • Obtain a list of Stoichiometry equations according to some criteria
  • Evaluate the consistency of the equations
  • Find the molecular masses of the components for each equation
  • Tabulate the formulas and found component molecular masses
  • Here we define LLM functions for retrieving chemical formulas with specified species:
    Here is a query:
    Let us convince ourselves that we got a list of strings:
    Let us see do we have consistent reaction equations by checking that the molecular masses on Left Hand Sides (LHSs) and Right Hand Side (RHSs) are the same:
    Remark: If the column "balancing" shows two different numbers separated by "=>" that means the LLM hallucinated an inconsistent chemical reaction equation. (Because the LLM does not know, or disregarded for some reason, the law of conservation of mass .)
    Here for each formula we extract the chemical components and find the corresponding reactants and products with their corresponding counts:
    Here we convert the association above into a long form dataset:
    Here we show the cross tabulation of the long format dataset:
    Here we replace each count entry with the corresponding molecular mass:
    Here we show required result:

    Alternative solutions

    Instead of using ChemicalFormula’s specs “ReactantCounts” and “ProductCounts” we can use LLM functions. Here is the first one:
    Here is an application:
    The LLM-extraction often given wrong results -- the coefficients of component formulas are not extracted correctly.
    Hence, we might be better of using LLM example functions. (Left as an exercise -- see below.)

    Exercise questions

  • What is a good approach to evaluate the ability of LLMs to respect the conservation of mass law?
  • Is it better for that evaluation to use predominantly WL code or mostly LLM-functions?
  • What would be the definition of a (few-shot) LLM example function that extracts chemical formulas and their coefficients from equations?
  • Making (embedded) Mermaid diagrams

    Workflow: We want to quickly initiate Mermaid-JS code for specific types of diagrams.
    Here is an LLM function for generating a Mermaid JS spec:
    Here we request to get the code of pie chart for the continent sizes:
    Here is a flow chart request:

    Exercise questions

  • What changes of the code in this section should be made in order to produce Plant-UML specs?
  • Named entity recognition

    Workflow: We want to download text from the Web, extract the names of certain types of entities from it, and visualize relationships between them.
    For example, we might want to extract all album names and their release dates from a biographical web page of a certain music artist, and make a timeline plot.
    Here is a way to get a biography and discography text data of a music artist from Wikipedia:
    But now we have to convert the HTML code into plain text, and the text is too large to process all at once with LLMs. (Currently LLMs have 4096 ± 2048 input tokens limits.)
    Remark: A more completely worked out workflow would have included the breaking up of the text into suitably sized chunks, and combining the LLM processed results.
    Instead, we are going to ask an LLM to produce artist's bio and discography and then we going to pretend we got it from some repository or encyclopedia.
    Here we get the text:
    Here we do Named Entity Recognition (NER) via the LLM function defined above:
    Let us try to parse LLM’s answer directly:
    Here we try another try with a JSON parser found in the “Developer`” context (see this MSE answer):
    LLMs can produce NER data in several different structures. Using the function deduce-type from "Data::TypeSystem" , [AAp6], can help required post-processing:
    Here are a few data type results based in multiple executions of fner (more comprehensive study is given in the next section):
    Based in our study of the result data type signatures, in this workflow we process result of fner with this code:
    Here we tabulate the result:
    Here we make a Mermaid-JS timeline plot (after we have figured out the structure of LLM's function output):
    Instead Mermaid-JS code we can use TimelinePlot:

    Exercise questions

  • How the LLM-functions pipeline above should be changed in order to produce timeline plots of different wars?
  • How the WL code should be changed in order to produce timeline plots with Python? (Instead of Mermaid-JS.)
  • Statistics of output data types

    Workflow: We want to see and evaluate the distribution of data types of LLM-function results:
    1
    .
    Make a pipeline of LLM-functions
    2
    .
    Create a list of random inputs "expected" by the pipeline
  • Or use the same input multiple times.
  • 3
    .
    Deduce the data type of each output
    4
    .
    Compute descriptive statistics
    Remark: These kind of statistical workflows can be slow and expensive. (With the current line-up of LLM services.)
    Let us reuse the workflow from the previous section and enhance it with data type outputs finding. More precisely we:
    5
    .
    Generate random music artist names (using an LLM query)
    6
    .
    Retrieve short biography and discography for each music artist
    7
    .
    Extract album-and-release-date data for each artist (with NER-by-LLM)
    8
    .
    Deduce the type for each output, using several different type representations
    The data types are investigated with the functions deduce-type and record-types of "Data::TypeSystem" , [AAp6], and tally and records-summary of "Data::Summarizers" , [AAp7].
    Here we define a data retrieval function:
    Here we define (again) the NER function:
    Here we find 10 random music artists:
    Since the result above failed we make the appropriate transformations:
    Here is a loop that generates the biographies and does NER over them:
    Here we try to re-ingest the failed JSON parsings:
    Here we call DeduceType on each LLM output:
    Since all results “albums” as a key we take value of each result, convert all list of rules into associations, and then redo the type deduction:
    The statistics show that most likely the output we get from the execution of the LLM-functions pipeline is a list or an association of rules. The above transformations are most likely for all calls of the pipeline fdb@*fner.

    Other workflows

    In the future other workflows are going to be described:
  • Interactive building of grammars
  • Using LLM-based code writing assistants
  • Test suite generation via Gherkin specifications
  • Here is a teaser .
  • (Reliable) code generation from help pages
  • Most likely all of the listed workflow would use chat objects and engineered prompts.

    References

    Articles

    Repositories, sites

    [WRIr1] Wolfram Research, Inc. Wolfram Prompt Repository .

    Packages, paclets

    [CWp1] Christopher Wolfram, OpenAILink WL paclet, (2023), Wolfram Language Paclet Repository .
    [WRIp1] Wolfram Research, Inc., LLMFunctions WL paclet , (2023), Wolfram Language Paclet Repository .
    Wolfram Cloud

    You are using a browser not supported by the Wolfram Cloud

    Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


    I understand and wish to continue anyway »

    You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.