Based on the repo and paper “Tree of Thoughts: Deliberate Problem Solving with Large Language Models”: https://arxiv.org/abs/2305.10601

Language models have proven effective problem solvers across a wide range of problems in many domains but are limited by a token-level decision-making process during inference. Because of this, initial decisions play a pivotal role and early mistakes can prevent success in later decision making. Tree of Thoughts (ToT) is a decision making framework introduced in (1) that generalizes Chain of Thought prompting (2), enabling language models to explore branching possibilities with coherent units of text (thoughts) as an intermediate step. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. In this project, we implement the Tree of Thoughts framework in Wolfram language for the first time, reproduce a solution for the game of 24, and begin to explore the use of this framework for more general problem solving in the context of coding challenges.

Initialization


Alternate Implementation with ServiceConnect


Game of 24

Game of 24 is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24.
For example, given input “4 9 10 13”, a solution output could be “(10-4)*(13-9)=24”.

In order to approach Game of 24 within the Tree of Thoughts framework we must first break it down into steps. A natural way to do this is to take two numbers at a time and apply an arithmetic operation
13 - 10 = 3
After this operation we have used two of our numbers and have two left (4 and 9) in addition to our result (3) so we have:
Step 1: 13 - 10 = 3 (left: 4 9 3)
After two more steps we can reach the solution:
Step 2: 9 - 3 = 6 (left: 4 6)
Step 3: 4 * 6 = 24 (left: 24)
Now that we have reached 24 we can construct our solution:
Answer: 4 * (9 - (13 - 10)) = 24

Before we proceed toward our solution with Tree of Thoughts, we provide some motivation as to why ToT or similar frameworks are very helpful for problem solving with LLMs.

First let’s look at how ChatGPT, in this case GPT-4 approaches solving this problem. Using the steps we provided above we will provide a 1-shot chain of thought (CoT) reasoning prompt using a handy function built into Wolfram Language as of version 13.3, LLMFunction (3).

Note: in order to run this notebook you must have an OpenAI API key (which you can set up here https://openai.com/blog/openai-api ). After getting your API key you can set it in your Wolfram environment using:

SystemCredentials[OPENAI_API_KEY] = "your_key"

Attempting to Solve Game of 24 with Chain of Thought Prompting

In[]:=

cot1Shot=LLMFunction["Use numbers and basic arithmetic operations (+ - * /) to obtain 24.Each step, you are only allowed to choose two of the remaining numbers to obtain a new number. You must use the four numbers in the input and only those numbers in your answer.Input: 4 4 6 8Steps:4 + 8 = 12 (left: 4 6 12)6 - 4 = 2 (left: 2 12)2 * 12 = 24 (left: 24)Answer: (6 - 4) * (4 + 8) = 24Input: `input`",LLMEvaluator->llmParams];

In[]:=

cot1ShotResponse=cot1Shot[<|"input"->"4 9 10 13"|>]

Out[]=

Steps:13 - 9 = 4 (left: 4 4 10)4 * 4 = 16 (left: 16 10)16 + 10 = 26 (left: 26)Answer: (13 - 9) * 4 + 10 = 26Note: It seems there's no way to obtain 24 with these numbers using basic arithmetic operations.

With a 1-shot prompt GPT-4 gets close to a solution obtaining a result of 26 and recognizing its mistake, states (falsely) that there is no way to obtain 24 with these numbers.

We can improve the performance with this approach by using more examples. Below we provide a 5-shot chain of thought prompt:

In[]:=

cot5Shot=LLMFunction["Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number. You must use the four numbers in the input and only those numbers in your answer.Input: 4 4 6 8Steps:4 + 8 = 12 (left: 4 6 12)6 - 4 = 2 (left: 2 12)2 * 12 = 24 (left: 24)Answer: (6 - 4) * (4 + 8) = 24Input: 2 9 10 12Steps:12 * 2 = 24 (left: 9 10 24)10 - 9 = 1 (left: 1 24)24 * 1 = 24 (left: 24)Answer: (12 * 2) * (10 - 9) = 24Input: 4 9 10 13Steps:13 - 10 = 3 (left: 3 4 9)9 - 3 = 6 (left: 4 6)4 * 6 = 24 (left: 24)Answer: 4 * (9 - (13 - 10)) = 24Input: 1 4 8 8Steps:8 / 4 = 2 (left: 1 2 8)1 + 2 = 3 (left: 3 8)3 * 8 = 24 (left: 24)Answer: (1 + 8 / 4) * 8 = 24Input: 5 5 5 9Steps:5 + 5 = 10 (left: 5 9 10)10 + 5 = 15 (left: 9 15)15 + 9 = 24 (left: 24)Answer: ((5 + 5) + 5) + 9 = 24Input: `input`",LLMEvaluator->llmParams];

In[]:=

cot5ShotResponse=cot5Shot[<|"input"->"4 9 10 13"|>]

Out[]=

Steps:13 - 10 = 3 (left: 3 4 9)9 - 3 = 6 (left: 4 6)4 * 6 = 24 (left: 24)Answer: 4 * (9 - (13 - 10)) = 24

In the 5-shot chain of thought case, GPT-4 provides a valid solution to the problem for the given input.
With more difficult inputs, GPT-4 is unable to find a solution with the 5-shot prompt.

In[]:=

cot5ShotResponse=cot5Shot[<|"input"->"3 4 5 13"|>]

Out[]=

Steps:13 - 4 = 9 (left: 3 5 9)9 - 5 = 4 (left: 3 4)3 * 4 = 12 (left: 12)12 * 2 = 24 (left: 24)Answer: 2 * (3 * (13 - 4 - 5)) = 24

Generate Thoughts with the Propose Prompt (1-shot learning example)

Tree of Thoughts improves language models’ problem solving ability by enabling them to explore a branching web of possible decisions. At any given step in our problem solving process we consider several options, evaluate them, and keep the best options in our set of options. The first step for any given input (either the task input or numbers left after an operation) is to generate a list of proposed options we can consider.

In[]:=

generateProposals=LLMFunction["Input: 2 8 8 14Possible next steps:2 + 8 = 10 (left: 8 10 14)8 / 2 = 4 (left: 4 8 14)14 + 2 = 16 (left: 8 8 16)2 * 8 = 16 (left: 8 14 16)8 - 2 = 6 (left: 6 8 14)14 - 8 = 6 (left: 2 6 8)14 / 2 = 7 (left: 7 8 8)14 - 2 = 12 (left: 8 8 12)Input: `input`Possible next steps:",StringSplit[#,"\n"]&,LLMEvaluator->llmParams];

Now we can use this function to generate a list of options for our task input and visualize the results as a tree.

In[]:=

taskInput="4 5 6 10";proposals=generateProposals[<|"input"->taskInput|>]

Out[]=

{4 + 5 = 9 (left: 6 9 10),5 + 6 = 11 (left: 4 10 11),10 - 4 = 6 (left: 5 6 6),6 + 4 = 10 (left: 5 10 10),10 / 5 = 2 (left: 2 4 6),6 - 5 = 1 (left: 1 4 10),10 - 6 = 4 (left: 4 4 5),5 * 4 = 20 (left: 6 10 20) ,10 / 4 = 2.5 (left: 2.5 5 6)}

In[]:=

Tree[StringTemplate["input: `input`"][<|"input"->taskInput|>],StringReplace[#,"("->"\n("]&/@proposals,TreeLayout->Left]

Out[]=

Evaluate Proposed Thoughts with Value Prompt

After we have generated a list of proposals we need to evaluate them so we can decide whether or not we should continue down a particular path of reasoning. In order to do this we use three categories which we will assign numerical values to form a rough heuristic for evaluating a particular proposal. Below we give an 8-shot prompt with varying inputs. If the numbers reach 24 we return “sure”. If they cannot reach 24 we return “impossible” and if we can get close we return “likely”. It is important to keep in mind that this is a rough heuristic and that the model may make mistakes in the evaluation that will persist later.

In[]:=

evaluateProposal=LLMFunction["Evaluate if given numbers can reach 24 (sure/likely/impossible)10 1410 + 14 = 24sure11 1211 + 12 = 2312 - 11 = 111 * 12 = 13211 / 12 = 0.91impossible4 4 104 + 4 + 10 = 8 + 10 = 184 * 10 - 4 = 40 - 4 = 36(10 - 4) * 4 = 6 * 4 = 24sure4 9 119 + 11 + 4 = 20 + 4 = 24sure5 7 85 + 7 + 8 = 12 + 8 = 20(8 - 5) * 7 = 3 * 7 = 21I cannot obtain 24 now, but numbers are within a reasonable rangelikely5 6 65 + 6 + 6 = 17(6 - 5) * 6 = 1 * 6 = 6I cannot obtain 24 now, but numbers are within a reasonable rangelikely10 10 1110 + 10 + 11 = 31(11 - 10) * 10 = 1010 10 10 are all too bigimpossible1 3 31 * 3 * 3 = 9(1 + 3) * 3 = 121 3 3 are all too smallimpossible`input`",LLMEvaluator->llmParams];

In order to evaluate a thought, we look at the numbers that remain after performing an operation.

Helper function to get left numbers from thoughts

In[]:=

getLeftNumbers[thought_String]:=First@StringCases[thought,"(left: "~~x__~~")":>x];

Evaluate proposal

In[]:=

proposal=proposals[[1]];evaluatedT=evaluateProposal[<|"input"->getLeftNumbers[proposal]|>]

Out[]=

6 + 9 + 10 = 25(10 - 6) * 9 = 36I cannot obtain 24 now, but numbers are within a reasonable rangelikely

Helper function to get value from evaluated thoughts

We assign to each category a numerical value. We heavily weight cases where the model is “sure” to avoid missing paths that can reach a solution.

In[]:=

getValueHelper[evaluatedT_]:=If[StringQ[evaluatedT],Switch[ToLowerCase[Last[StringSplit[evaluatedT,"\n"]]],"likely",1,"sure",20,"impossible",0.1,_,0],0];

In[]:=

value=getValueHelper[evaluatedT]

Out[]=

In[]:=

proposals

Out[]=

{4 + 5 = 9 (left: 6 9 10),5 + 6 = 11 (left: 4 10 11),6 + 10 = 16 (left: 4 5 16),4 * 5 = 20 (left: 6 10 20),10 - 4 = 6 (left: 5 6 6),10 - 5 = 5 (left: 4 5 6),6 - 4 = 2 (left: 2 5 10),6 / 4 = 1.5 (left: 1.5 5 10),10 / 5 = 2 (left: 2 4 6)}

For each of our proposed steps we get a value. We perform n=3 evaluations for each proposal and sum the results

values=getValues[proposals,taskInput]

Out[]=

{3,41,3,2.1,2.1,2.1,2.1,2.1,3}

Here we visualize the results after generating proposals for the first step and evaluating.

valuedProposals=(Tree[StringReplace[proposals[[#]],"("->"\n("],{StringForm["value = ``",NumberForm[values[[#]],2]]}])&/@Range[Length[proposals]];Tree[StringTemplate["input: `input`"][<|"input"->taskInput|>],valuedProposals,TreeLayout->Left,TreeElementStyle->{{_}->LightYellow,{_,__}->LightBlue},ImageSize->300]

Out[]=

Select Top Proposals and Repeat The Process

The next step is to select the top proposals based on value and generate new proposals for each of these with the remaining numbers as input. Here we select the top 5 proposals from the previous step.

In[]:=

nSelect=5;ids=Range[Length[proposals]];sortedIDs=Reverse@SortBy[ids,values[[#]]&];selectIDs=Take[sortedIDs,Min[nSelect,Length[sortedIDs]]];newProposals=proposals[[#]]&/@selectIDs

Out[]=

{10 - 5 = 5 (left: 4 5 6),10 - 4 = 6 (left: 5 6 6),4 + 5 = 9 (left: 6 9 10),10 / 4 = 2.5 (left: 2.5 5 6),6 - 4 = 2 (left: 2 5 10)}

valuedProposals=(Tree[proposals[[#]],{StringForm["value = ``",NumberForm[values[[#]],2]]}])&/@Range[Length[proposals]];

Here is the pruned tree with the top 5 proposals.

In[]:=

Tree[StringTemplate["input: `input`"][<|"input"->taskInput|>],StringReplace[#,"("->"\n("]&/@newProposals,TreeLayout->Left,TreeElementStyle->{{_}->LightYellow},ImageSize->200]

Out[]=

proposals=Flatten[getProposals[#,taskInput]&/@newProposals];

In[]:=

first=StringSplit[proposals[[1]],"\n"][[1]];split=StringSplit[#,"\n"]&/@proposals;filtered=(#[[2]]&)/@Select[split,#[[1]]==first&];

Here we visualize a sub-tree for a select path in our tree.

In[]:=

Tree[first,StringReplace[#,"("->"\n("]&/@filtered,TreeLayout->Left,TreeElementStyle->{{}->LightYellow,{_}->LightOrange},ImageSize->250]

Out[]=

Generate and Evaluate Answers

In order to produce a solution to the task, we must write the final equation resulting from the steps we have followed so far. We know we have reached the final step when our previous operation results in only “24” remaining i.e. our output ends in “(left: 24)”.

In[]:=

generateAnswer=LLMFunction["Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.Input: 4 4 6 8Steps:4 + 8 = 12 (left: 4 6 12)6 - 4 = 2 (left: 2 12)2 * 12 = 24 (left: 24)Answer: (6 - 4) * (4 + 8) = 24Input: 2 9 10 12Steps:12 * 2 = 24 (left: 9 10 24)10 - 9 = 1 (left: 1 24)24 * 1 = 24 (left: 24)Answer: (12 * 2) * (10 - 9) = 24Input: 4 9 10 13Steps:13 - 10 = 3 (left: 3 4 9)9 - 3 = 6 (left: 4 6)4 * 6 = 24 (left: 24)Answer: 4 * (9 - (13 - 10)) = 24Input: 1 4 8 8Steps:8 / 4 = 2 (left: 1 2 8)1 + 2 = 3 (left: 3 8)3 * 8 = 24 (left: 24)Answer: (1 + 8 / 4) * 8 = 24Input: 5 5 5 9Steps:5 + 5 = 10 (left: 5 9 10)10 + 5 = 15 (left: 9 15)15 + 9 = 24 (left: 24)Answer: ((5 + 5) + 5) + 9 = 24Input: `input`Steps:`steps`",LLMEvaluator->llmParams];

In[]:=

taskInput="4 5 6 10";steps="5 * 4 = 20 (left: 6 10 20)10 + 20 = 30 (left: 6 30)30 - 6 = 24 (left: 24)";answer=generateAnswer[<|"input"->taskInput,"steps"->steps|>]

Out[]=

Answer: (5 * 4 + 10) - 6 = 24

Finally, once we arrive at an answer to our problem, we want to make sure to evaluate it to ensure that it is indeed a valid answer.

In[]:=

evaluateAnswer=LLMFunction["Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Given an input and an answer, give a judgement (sure/impossible) if the answer is correct, i.e. it uses each input exactly once and no other numbers, and reach 24.Input: 4 4 6 8Answer: (4 + 8) * (6 - 4) = 24Judge: sureInput: 2 9 10 12Answer: 2 * 12 * (10 - 9) = 24Judge: sureInput: 4 9 10 13Answer: (13 - 9) * (10 - 4) = 24Judge: sureInput: 4 4 6 8Answer: (4 + 8) * (6 - 4) + 1 = 25Judge: impossibleInput: 2 9 10 12Answer: 2 * (12 - 10) = 24Judge: impossibleInput: 4 9 10 13Answer: (13 - 4) * (10 - 9) = 24Judge: impossibleInput: `input`Answer: `answer`Judge:",LLMEvaluator->llmParams];

In[]:=

evaluatedAnswer=evaluateAnswer[<|"input"->taskInput,"answer"->answer|>]

Out[]=

sure

Now we’re ready to solve Game of 24!

Solving Game of 24 with Tree of Thoughts

Our solveTask function runs iteratively for four steps generating proposals, evaluating them, taking the best

In[]:=

(*getProposals[steps_String,taskInput_String]:=*)solveTask[taskInput_]:=Module[{nSteps=4,nSelect=5,proposals,newProposals,values,ids,sortedIDs,selectIDs},proposals={""};Do[newProposals=Flatten[getProposals[#,taskInput]&/@proposals];values=getValues[newProposals,taskInput];ids=Range[Length[newProposals]];sortedIDs=Reverse@SortBy[ids,values[[#]]&];selectIDs=Take[sortedIDs,Min[nSelect,Length[sortedIDs]]];proposals=newProposals[[#]]&/@selectIDs,{i,nSteps}];proposals];

results=solveTask["4 5 6 10"]

Out[]=

{10 / 5 = 2 (left: 2 4 6)4 - 2 = 2 (left: 2 6)2 * 6 = 12 (left: 12)2 * 12 = 24 (left: 8 8 24),10 - 4 = 6 (left: 5 6 6)5 * 6 = 30 (left: 6 30)30 - 6 = 24 (left: 24)Answer: (5 * (10 - 4)) - 6 = 24,10 / 5 = 2 (left: 2 4 6)4 - 2 = 2 (left: 2 6)6 - 2 = 4 (left: 4)14 - 4 = 10 (left: 8 10 10),10 / 5 = 2 (left: 2 4 6)4 - 2 = 2 (left: 2 6)6 / 2 = 3 (left: 3)3 - 2 = 1 (left: 1 8 8 14),10 / 5 = 2 (left: 2 4 6)4 - 2 = 2 (left: 2 6)6 / 2 = 3 (left: 3)14 / 3 = 4.67 (approximately) (left: 2 4.67 8 8)}

In[]:=

Tree[StringTemplate["input: `input`"][<|"input"->taskInput|>],treeData1,TreeLayout->Left,TreeElementStyle->treeStyle,ImageSize->600]

Out[]=

Extending to Other Problems

How can we extend the tree of thoughts methodology to other types of problems with less well defined solutions and thought spaces? We can look at using language models to solve coding challenges such as those presented in An Elementary Introduction to the Wolfram Language (4).

Below are a few selected exercises from the book.

In[]:=

exercise1="Make a combined list of the first 5 squares and cubes (numbers raised to the power 3), sorted into order.";exercise2="Find the second-to-last digit of 2^32.";exercise3="Make a graph with 50 nodes, in which node i connects to node i+1.";exercise4="Make a line plot of the numerical phase of the moon for each of the next 30 days.";exercise5="In the digit lists for the first 1000 squares, find those that begin with 9 and end with 0 or 1.";

Unlike the case of Game of 24, there is no obvious way to break down any given coding problem into a series of steps that we can use for Tree of Thoughts. One approach is to ask GPT-4 to help us break down a problem.

In[]:=

getSteps=evaluateProposal=LLMFunction["Here is a programming task in Wolfram language:`input`How would you break this problem down into a step by step procedure you could use to arrive at an answer?",LLMEvaluator->llmParams];

In[]:=

steps1=getSteps[<|"input"->exercise1|>]

Out[]=

Step 1: Generate a list of the first 5 natural numbers.Wolfram language has a built-in function `Range[]` that can help us generate a list of the first 5 natural numbers. `Range[5]` will give us `{1, 2, 3, 4, 5}`.Step 2: Square each of the numbers in the list.In Wolfram language, you can use `Map[]` or `/@` to apply a function to each element in the list. `#^2& /@ Range[5]` will give us a list of the squares of the first 5 natural numbers.Step 3: Cube each of the numbers in the list.Similarly, `#^3& /@ Range[5]` will give us a list of the cubes of the first 5 natural numbers.Step 4: Combine the list of squares and cubes.You can use `Join[]` to combine the two lists. `Join[#^2& /@ Range[5], #^3& /@ Range[5]]` will give us a combined list of the first 5 squares and cubes.Step 5: Sort the combined list into order.Finally, you can use `Sort[]` to sort the combined list into ascending order. `Sort[Join[#^2& /@ Range[5], #^3& /@ Range[5]]]` will give us the sorted combined list.Putting it all together, the Wolfram language code to solve the problem is:`Sort[Join[#^2& /@ Range[5], #^3& /@ Range[5]]]`

In this case, GPT-4 is not only able to reproduce some reasonable steps but also generates a valid answer for the problem. These steps can be used to generate a prompt that can help us solve more challenging problems.

In order to make our chain of thought prompt for proposals we use the above prompt to generate steps for a number of problems. We sample a subset of the steps produced and suggest the model to provide a possible next step.

proposalPrompt="Given a task and a list of steps generate the likely next step that gets closer to achieving the task.Task: Make a combined list of the first 5 squares and cubes (numbers raised to the power 3), sorted into order.Steps:1. Generate a list of the first five natural numbers. In Wolfram Language, you can do this with the Range[] function, as Range[5].2. Square each number in the list. This can be done by applying the power operator ^ to the list, as Range[5]^2. Wolfram Language will automatically apply the operation to each element of the list.3. Cube each number in the list. This is done in the same way as the squaring operation, but with 3 as the exponent, as Range[5]^3.Possible next step:4. Join the list of squares and the list of cubes into a single list. The Join[] function can be used for this, as Join[Range[5]^2, Range[5]^3].Task: Make a line plot of the numerical phase of the moon for each of the next 30 days.Steps:1. Generate a list of the next 30 days. In Wolfram Language, you can use the Today function to get today's date, and you can add n days to this date with Today + n Quantity[1, \"Days\"]. If you use the Table[] function to do this for n from 0 to 29, you get a list of the dates of the next 30 days.Possible next step:2. For each date in this list, calculate the numerical phase of the moon. The MoonPhase[] function in Wolfram Language does exactly this. So Table[MoonPhase[Today + n Quantity[1, \"Days\"]], {n, 30}] generates a list of the moon phases for the next 30 daysTask: In the digit lists for the first 1000 squares, find those that begin with 9 and end with 0 or 1.Steps:1. Generate a list of the squares of the first 1000 natural numbers. This can be done using the Range[] function and the power operator ^, as Range[1000]^2.2. Convert each number in this list to a list of its digits. The IntegerDigits[] function can be used for this, so IntegerDigits[Range[1000]^2] gives a list of lists, where each inner list is the digit list of a square.Possible next step:3. Search this list of lists for lists that begin with 9 and end with either 0 or 1. The Cases[] function in Wolfram Language can be used for this. The pattern {9, __, 0 | 1} matches any list that begins with 9 (9) and ends with either 0 or 1 (0 | 1), with any number of any digits in between (__). Therefore, Cases[IntegerDigits[Range[1000]^2], {9, __, 0 | 1}] returns all lists of digits that represent squares beginning with 9 and ending with 0 or 1.Input: `input`Steps: `steps`Possible next step:";

Similar to the case for Game of 24 we can ask the model to evaluate a proposal, in this case asking it whether a set of steps will accomplish (sure), is reasonable (likely) or cannot produce the desired result (impossible).

evaluateProposalPrompt="Evaluate if following the procedure given in the steps will accomplish the given task (sure/likely/impossible)Task: Find the second-to-last digit of 2^32.Steps:1. Calculate the number 2^32. In Wolfram Language, this is done simply as 2^32.2. Convert the result into a list of digits. Wolfram Language provides the function IntegerDigits[] for this purpose.3. Reverse the order of the list of digits. This can be done with the Reverse[] function. Reversing the list of digits will allow us to directly address the second-to-last digit as the second element of the reversed list (since Wolfram Language uses 1-based indexing).4. Extract the second element from the reversed list of digits. This can be done with the Part[] function (which is often abbreviated as [[ ]]), using 2 as the index.sureTask: Make a combined list of the first 5 squares and cubes (numbers raised to the power 3), sorted into order.Steps:1. Generate a list of the first five natural numbers. In Wolfram Language, you can do this with the Range[] function, as Range[5].2. Square each number in the list. This can be done by applying the power operator ^ to the list, as Range[5]^2. Wolfram Language will automatically apply the operation to each element of the list.3. Cube each number in the list. This is done in the same way as the squaring operation, but with 3 as the exponent, as Range[5]^3.I cannot accomplish the task now, but the steps are reasonablelikelyTask: Make a graph with 50 nodes, in which node i connects to node i+1.Steps:1. Generate a list of numbers from 1 to 49. In Wolfram Language, you can do this using the Table[] function as Table[i, {i, 49}]. However, for this task, we are not just generating a list of numbers; we are generating a list of directed edges between nodes.2. For each number i in this list, create an edge from node i + 1 to node i. This can be done using the -> operator in Wolfram Language. So the command Table[i + 1 -> i, {i, 49}] generates a list of edges where each node i (for i from 1 to 49) is connected to node i + 1.3. Create a graph from this list of edges. Wolfram Language has a Graph[] function that can take a list of edges as input and generate a corresponding graph.impossibleTask: Make a line plot of the numerical phase of the moon for each of the next 30 days.Steps: 1. Generate a list of the next 30 days. In Wolfram Language, you can use the Today function to get today's date, and you can add n days to this date with Today + n Quantity[1, \"Days\"]. If you use the Table[] function to do this for n from 0 to 29, you get a list of the dates of the next 30 days.2. For each date in this list, calculate the numerical phase of the moon. The MoonPhase[] function in Wolfram Language does exactly this. So Table[MoonPhase[Today + n + 1 Quantity[1, \"Days\"]], {n, 30}] generates a list of the moon phases for the next 30 days3. Create a line plot of this list of moon phases. You can use the ListLinePlot[] function for this. Therefore, ListLinePlot[Table[MoonPhase[Today + n + 1 Quantity[1, \"Days\"]], {n, 30}]] creates the desired plot.impossibleTask: In the digit lists for the first 1000 squares, find those that begin with 9 and end with 0 or 1.Steps:1. Generate a list of the squares of the first 1000 natural numbers. This can be done using the Range[] function and the power operator ^, as Range[1000]^2.2. Convert each number in this list to a list of its digits. The IntegerDigits[] function can be used for this, so IntegerDigits[Range[1000]^2] gives a list of lists, where each inner list is the digit list of a square.3. Search this list of lists for lists that begin with 9 and end with either 0 or 1. The Cases[] function in Wolfram Language can be used for this. The pattern {9, __, 0 | 1} matches any list that begins with 9 (9) and ends with either 0 or 1 (0 | 1), with any number of any digits in between (__). Therefore, Cases[IntegerDigits[Range[1000]^2], {9, __, 0 | 1}] returns all lists of digits that represent squares beginning with 9 and ending with 0 or 1.sureTask: `input`Steps:`steps`"

In order to produce a solution to the coding challenge we ask the model to produce an answer for the problem given a series of steps.

answerPrompt="Follow the steps given to write a single line of Wolfram language code that satisfies the given task.Task: Find the second-to-last digit of 2^32.Steps:1. Calculate the number 2^32. In Wolfram Language, this is done simply as 2^32.2. Convert the result into a list of digits. Wolfram Language provides the function IntegerDigits[] for this purpose.3. Reverse the order of the list of digits. This can be done with the Reverse[] function. Reversing the list of digits will allow us to directly address the second-to-last digit as the second element of the reversed list (since Wolfram Language uses 1-based indexing).4. Extract the second element from the reversed list of digits. This can be done with the Part[] function (which is often abbreviated as [[ ]]), using 2 as the index.Answer: (2^32 // IntegerDigits // Reverse)[[2]]Task: Make a combined list of the first 5 squares and cubes (numbers raised to the power 3), sorted into order.Steps:1. Generate a list of the first five natural numbers. In Wolfram Language, you can do this with the Range[] function, as Range[5].2. Square each number in the list. This can be done by applying the power operator ^ to the list, as Range[5]^2. Wolfram Language will automatically apply the operation to each element of the list.3. Cube each number in the list. This is done in the same way as the squaring operation, but with 3 as the exponent, as Range[5]^3.4. Join the list of squares and the list of cubes into a single list. The Join[] function can be used for this, as Join[Range[5]^2, Range[5]^3].5. Sort the combined list into order. This can be done with the Sort[] function, as Sort[Join[Range[5]^2, Range[5]^3]].Answer: Sort[Join[Range[5]^2,Range[5]^3]]Task: Make a graph with 50 nodes, in which node i connects to node i+1.Steps:1. Generate a list of numbers from 1 to 49. In Wolfram Language, you can do this using the Table[] function as Table[i, {i, 49}]. However, for this task, we are not just generating a list of numbers; we are generating a list of directed edges between nodes.2. For each number i in this list, create an edge from node i to node i + 1. This can be done using the -> operator in Wolfram Language. So the command Table[i -> i + 1, {i, 49}] generates a list of edges where each node i (for i from 1 to 49) is connected to node i + 1.3. Create a graph from this list of edges. Wolfram Language has a Graph[] function that can take a list of edges as input and generate a corresponding graph.Answer: Graph[Table[i -> i + 1, {i, 49}]]Task: Make a line plot of the numerical phase of the moon for each of the next 30 days.Steps:1. Generate a list of the next 30 days. In Wolfram Language, you can use the Today function to get today's date, and you can add n days to this date with Today + n Quantity[1, \"Days\"]. If you use the Table[] function to do this for n from 0 to 29, you get a list of the dates of the next 30 days.2. For each date in this list, calculate the numerical phase of the moon. The MoonPhase[] function in Wolfram Language does exactly this. So Table[MoonPhase[Today + n Quantity[1, \"Days\"]], {n, 30}] generates a list of the moon phases for the next 30 days3. Create a line plot of this list of moon phases. You can use the ListLinePlot[] function for this. Therefore, ListLinePlot[Table[MoonPhase[Today + n Quantity[1, \"Days\"]], {n, 30}]] creates the desired plot.Answer: ListLinePlot[Table[MoonPhase[Today + n Quantity[1,\"Days\"]],{n,30}]]Task: In the digit lists for the first 1000 squares, find those that begin with 9 and end with 0 or 1.Steps:1. Generate a list of the squares of the first 1000 natural numbers. This can be done using the Range[] function and the power operator ^, as Range[1000]^2.2. Convert each number in this list to a list of its digits. The IntegerDigits[] function can be used for this, so IntegerDigits[Range[1000]^2] gives a list of lists, where each inner list is the digit list of a square.3. Search this list of lists for lists that begin with 9 and end with either 0 or 1. The Cases[] function in Wolfram Language can be used for this. The pattern {9, __, 0 | 1} matches any list that begins with 9 (9) and ends with either 0 or 1 (0 | 1), with any number of any digits in between (__). Therefore, Cases[IntegerDigits[Range[1000]^2], {9, __, 0 | 1}] returns all lists of digits that represent squares beginning with 9 and ending with 0 or 1.Answer: Cases[IntegerDigits[Range[1000]^2],{9,__,0 | 1}]Task: `input`Steps:`steps`";

Once we have produced an answer, we can evaluate the answer and judge whether it produced the desired result.

evaluateAnswerPrompt="Given a task and an answer, give a judgement (sure/impossible) if the answer is correct, i.e. the line of code provided in the answer, when evaluated will give the result specified in the task.Task: Find the second-to-last digit of 2^32.Answer: (2^32 // IntegerDigits // Reverse)[[2]]Judge: sureTask: Find the second-to-last digit of 2^32.Answer: (2^32 // IntegerDigits // Reverse)[[1]]Judge: impossibleTask: Make a graph with 50 nodes, in which node i connects to node i+1.Answer: Graph[Table[i -> i + 1, {i, 49}]]Judge:sureTask: Make a graph with 50 nodes, in which node i connects to node i+1.Answer: Graph[Table[i -> i + 1, {i, 50}]]Judge:impossibleTask: Make a combined list of the first 5 squares and cubes (numbers raised to the power 3), sorted into order.Answer: Sort[Join[Range[5]^2,Range[5]^3]]Judge:sureTask: Make a combined list of the first 5 squares and cubes (numbers raised to the power 3), sorted into order.Answer: Sort[Table[i^2, {i, 5}], Table[i^3, {i, 5}]]Judge:impossibleTask: `input`Answer: `answer`Judge:"

Conclusion

We investigated the use of the Tree of Thoughts framework to extend language models’ problem solving ability for difficult problems that require exploring a diversity of intermediate thoughts. We used the newly introduced LLMFunction capability in Wolfram Language to reproduce the ToT algorithm for the Game of 24. We visualized the results of exploring and evaluating proposals for intermediate steps in this game. We produce and visualize a solution to the game by exploring the Tree of Thoughts. Finally we begin to explore extending this framework to solving coding challenges, providing prompts that could be used toward this end.

Acknowledgment

I’d like to thank Jofre Espigule, my project mentor, for his extensive support and mentorship and helpful ideas throughout the course of my project. Thanks also to Steven Wolfram, Mads, Xerxes, Yi, Danielle, Stephanie and all the other mentors and TAs that made Wolfram Summer School 2023 possible.

References

Tree of Thoughts: Deliberate Problem Solving with Large Language Models - https://arxiv.org/abs/2305.10601

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models - https://arxiv.org/abs/2201.11903

The New World of LLM Functions: Integrating LLM Technology into the Wolfram Language - https://writings.stephenwolfram.com/2023/05/the-new-world-of-llm-functions-integrating-llm-technology-into-the-wolfram-language/

An Elementary Introduction to the Wolfram Language - https://www.wolfram.com/language/elementary-introduction/3rd-ed/

Progressive-Hint Prompting Improves Reasoning in Large Language Models - https://arxiv.org/abs/2304.09797

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents - https://arxiv.org/abs/2201.07207

Inner Monologue: Embodied Reasoning through Planning with Language Models - https://arxiv.org/abs/2207.05608

OpenAI GPT-4 Technical Report - https://arxiv.org/abs/2303.08774

Reflexion: Language Agents with Verbal Reinforcement Learning - https://arxiv.org/abs/2303.11366

S.A. Sloman. The empirical case for two systems of reasoning. Psychological bulletin, 119(1): 3,1996.

CITE THIS NOTEBOOK

Tree of thoughts: exploratory problem solving with LLMs
by Rob Giometti
Wolfram Community, STAFF PICKS, July 12, 2023
https://community.wolfram.com/groups/-/m/t/2959295

Initialization

Alternate Implementation with ServiceConnect

Game of 24

Attempting to Solve Game of 24 with Chain of Thought Prompting

Generate Thoughts with the Propose Prompt (1-shot learning example)

Evaluate Proposed Thoughts with Value Prompt

Helper function to get left numbers from thoughts

Evaluate proposal

Helper function to get value from evaluated thoughts

Select Top Proposals and Repeat The Process

Generate and Evaluate Answers

Solving Game of 24 with Tree of Thoughts

Extending to Other Problems

Conclusion

Acknowledgment

References

CITE THIS NOTEBOOK

Initialization


Alternate Implementation with ServiceConnect
