This project applies personality testing (specifically, using the “Big Five” test framework) to an LLM that has been prompted to assume different personas. We first implement a pipeline to generate prompts for these personas, each of which targets a different set of traits. We then feed a dataset of 300 personality questions to the personality-prompted LLM and store these results as a vector, which itself is used to compute an overall score based on Big Five scoring in human psychology. We evaluate the success of our vector and the test scoring as mechanisms for encoding and evaluating LLM personality traits by assessing the test’s ability to correctly identify the traits being targeted by the prompt. We show that the test and our vector are successful in this endeavor. We also compare the vectors and scores to each other within the 300-dimension vector space and analyze the clustering to find that personas that generate similar Big Five test scores tend to cluster together regardless of whether or not they share common traits.

Introduction

As AI—and particularly Large Language Models (LLMs)—find more and more use cases in our business workflows and daily lives, it becomes more and more important to try and understand how LLMs think and behave. However, as LLMs themselves are notorious black boxes whose inner workings are nearly impossible to discern, studying LLM behavior has proven to be incredibly difficult. One possible approach—and what I have done in this project—is to use methods from the field of human psychology, a field which is itself dedicated to the study of an even more complex black box system than LLMs—the human brain.
​
The “Big Five” test is a psychological framework for testing and understanding human personalities. The test measures 30 different personality traits, which are split up and grouped into the “Big Five” or “OCEAN” categories: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. In this project, we adapt a 300-question survey meant to test humans for these traits, and we create a computational pipeline for evaluating LLM performance on these questions.
​
This project is meant to be proof of concept that aspects of an LLM’s personality (or, if you will, its brain) can be quantified and measured from the outside of a model exclusively through prompting and data analysis. The project is motivated by a higher ambition that believes that if we can, in fact, encode accurate LLM personality descriptions in a vector, then maybe this vector could be used not just to describe current behavior, but to predict LLM decision-making behavior in the future. However, this project only serves as that proof of concept that our personality vectors appear to correctly encode the personality traits targeted by the prompts. Thus, I leave further discussion of the prediction idea in the Future Work section.

Generating LLM Personas

The Big Five test measures 30 different personality traits. Ideally, our persona generation should be able to tune any of these 30. Dr. John A. Johnson, Professor Emeritus of Psychology at Penn State University, hosts an online version of the Big Five test [1] which includes descriptions of each of these 30 traits as well as what high and low scores in that category on the Big Five test represent. We have adapted these descriptions into prompts designed to be fed to LLMs to convey that personality trait. Each trait has two associated prompts—one that is designed to induce a high score in that category on the Big Five test and one that is designed to induce a low score. The prompts are listed in the table below. Note that the traits are also grouped into their respective OCEAN categories.
Out[]=
Prompt
O
Imagination
You believe that the real world is often too plain and ordinary. You use fantasy as a way of creating a richer, more interesting world. You are more oriented to fantasy than facts.
You believe the real world is rich and engaging as it is. You prefer to focus on facts and reality rather than escape into fantasy. You are more oriented to facts than fantasy.
Artistic Interests
You love beauty, both in art and in nature. You become easily involved and absorbed in artistic and natural events. You may or may not be artistically trained or talented. You are interested in and have an appreciation of natural and artificial beauty.
You are largely indifferent to beauty, whether in art or nature. You rarely become emotionally involved or absorbed in artistic or natural experiences. Regardless of any artistic training or talent, you have little interest in or appreciation for natural or man-made beauty.
6 total ›
C
Self-efficacy
You have confidence in your ability to accomplish things. You believe you have the intelligence (common sense), drive, and self-control necessary for achieving success. You feel effective and have a sense that you are in control of your life.
You often doubt your ability to accomplish things. You question whether you have the intelligence, motivation, or self-discipline needed for success. You may feel ineffective and have a sense that you’re not fully in control of your life.
Orderliness
You are well-organized. You like to live according to routines and schedules. You keep lists and make plans. You tend to be organized and focused.
You are disorganized and spontaneous. You prefer to go with the flow rather than follow routines or schedules. You rarely make lists or detailed plans and tend to be scattered and unfocused.
6 total ›
E
Friendliness
You genuinely like other people and openly demonstrate positive feelings toward others. You make friends quickly and it is easy for you to form close, intimate relationships. You reach out to others and are not perceived as distant or reserved.
You tend to be reserved and cautious around others. You find it difficult to make friends quickly, and you may not openly show positive feelings. You often keep to yourself and may be perceived as distant or reserved.
Gregariousness
You find the company of others pleasantly stimulating and rewarding. You enjoy the excitement of crowds. You do not feel overwhelmed by, and therefore do not avoid, large crowds. You generally enjoy being with people and have a lower need for privacy and alone time compared to others.
You tend to find the company of others draining rather than stimulating. You often feel overwhelmed by crowds and prefer to avoid large gatherings. You usually value privacy and alone time more than social interaction.
6 total ›
A
Trust
You assume that most people are fair, honest, and have good intentions. You do not usually see others as selfish, devious, or potentially dangerous.
You tend to be skeptical of others' motives and often assume people may be selfish, dishonest, or have questionable intentions. You usually see others as selfish, devious, or potentially dangerous.
Morality
You see no need for pretense or manipulation when dealing with others and are therefore candid, frank, and sincere. You do not believe that a certain amount of deception in social relationships is necessary. People find it relatively easy to relate to you, as you are straightforward.
You often find it necessary to use pretense or manipulation when interacting with others and may not always be candid or sincere. You believe that a certain amount of deception in social relationships is necessary. People generally find it more difficult to relate to you, as you are unstraightforward. You are more guarded and less willing to openly reveal the whole truth.
6 total ›
N
Anxiety
The fight or flight system of your brain is easily and often engaged. Therefore, you often feel like something dangerous is about to happen. You may be afraid of specific situations or be just generally fearful. You feel tense, jittery, and nervous.
Your fight or flight response is rarely activated. You generally feel calm and safe, without a strong sense that something dangerous is about to happen. You are typically relaxed, composed, and free from excessive fear or nervousness.
Anger
You feel enraged when things do not go your way. You are sensitive about being treated fairly and feel resentful and bitter when you feel you are being cheated. You have a tendency to feel anger often, but you may or may not express it.
You stay composed when things don’t go your way. You are not overly sensitive to issues of fairness and rarely feel resentful or bitter, even if you are treated unfairly. You seldom experience anger and are generally even-tempered.
6 total ›
Prompting for every single trait could be overkill—we don’t want to over-tune our personalities! We would also forfeit our ability to analyze how some personality traits affect other traits even if those other traits aren’t specifically targeted in the prompt. Also, as the persona is fed to the LLM as a paragraph combining all of its trait prompts, this would be a giant prompt which could potentially cause some token-limit issues and be quite expensive! So, to continue with the “Big Five” motif, we have decided to construct our LLM personas by randomly sampling 5 traits from the list of 30 (however, the number of chosen traits can be altered by modifying the ‘nFacets’ parameter in the code below). For each sampled trait, we also randomly select whether that trait will be targeted to score high or low. Note that it is not possible for a persona to receive both the high-scoring and low-scoring prompts for a single trait. We can generate a sample of personality prompts with:
ClearAll[personalityPromptComponentSample]​​personalityPromptComponentSample[nFacets_Integer?Positive]:=Flatten[​​ Values[Normal[groupedPersonalityPromptComponents[All,All,RandomChoice][​​ Values,Values][Flatten][RandomSample[#,nFacets]&]]]]
With this idea, we now generate a Dataset of random personas. The keys of this dataset will be randomly generated names, and the values will be a dataset containing the 5 randomly chosen trait prompts. Due to the limitations of time and resources, this study analyzes a relatively limited set of 30 personalities, but this can be changed by modifying the ‘nPersonas’ parameter in the code below. A set of personas is generated by running:
ClearAll[personaPersonalityPrompts]​​personaPersonalityPrompts=With[{nPersonas=30,nFacets=5},​​ Dataset[AssociationThread[​​ EntityValue[RandomEntity["GivenName",nPersonas],"Name"],​​ Table[personalityPromptComponentSample[nFacets],nPersonas]],​​ MaxItems->3]]
However, to discuss our findings, we will reproduce the set of 30 personas that was randomly generated for our own test run:
Out[]=
Nubia
You are sensitive about what others think of them. Your concern about rejection and ridicule causes you to feel shy and uncomfortable around others. You are easily embarrassed and often feel ashamed. Your fears that others will criticize or make fun of you are exaggerated and unrealistic, but your awkwardness and discomfort may make these fears a self-fulfilling prophecy.
You believe that the real world is often too plain and ordinary. You use fantasy as a way of creating a richer, more interesting world. You are more oriented to fantasy than facts.
You experience panic, confusion, and helplessness when under pressure or stress. You do not feel poised, confident, or clear-thinking when stressed.
You see no need for pretense or manipulation when dealing with others and are therefore candid, frank, and sincere. You do not believe that a certain amount of deception in social relationships is necessary. People find it relatively easy to relate to you, as you are straightforward.
You are content with low levels of stimulation and rarely feel bored. You prefer calm, quiet environments over bright lights and busy settings. You tend to avoid risks and thrills, and you are easily overwhelmed by noise or commotion.
Shaliek
You have the disposition to think through possibilities before acting. You take your time when making decisions. You rarely say or do first thing that comes to mind without deliberating alternatives and the probable consequences of those alternatives.
You often struggle to persist with difficult or unpleasant tasks until completion. You may hesitate to begin tasks and can be easily distracted. You procrastinate and show poor follow-through, often failing to even complete tasks you want very much to complete.
You tend to find the company of others draining rather than stimulating. You often feel overwhelmed by crowds and prefer to avoid large gatherings. You usually value privacy and alone time more than social interaction.
The fight or flight system of your brain is easily and often engaged. Therefore, you often feel like something dangerous is about to happen. You may be afraid of specific situations or be just generally fearful. You feel tense, jittery, and nervous.
You tend to be reserved and cautious around others. You find it difficult to make friends quickly, and you may not openly show positive feelings. You often keep to yourself and may be perceived as distant or reserved.
Ryesha
You have the disposition to think through possibilities before acting. You take your time when making decisions. You rarely say or do first thing that comes to mind without deliberating alternatives and the probable consequences of those alternatives.
You have a strong respect for authority, tradition, and established values. You are inclined to follow rules, disapprove of law-breakers, and prefer clarity, order, and structure over ambiguity or chaos. You prefer the security and stability brought by conformity to tradition.
You often assert that you are better than others and tend to compare yourself favorably. In some cases, this attitude may stem from high self-confidence or a strong sense of self-esteem.
You assume that most people are fair, honest, and have good intentions. You do not usually see others as selfish, devious, or potentially dangerous.
You believe the real world is rich and engaging as it is. You prefer to focus on facts and reality rather than escape into fantasy. You are more oriented to facts than fantasy.
rows 1–3 of 30
We also define a function to summarize which traits are described by a given set of prompts:
ClearAll[personalityPromptComponentSummary]​​personalityPromptComponentSummary[facetDescriptions_List]:=personalityPromptComponents[​​ Select[MemberQ[facetDescriptions,#Prompt]&]][KeyDrop[{"Prompt"}]]
For example, the first personality in our dataset, Nubia, is fed prompts that induce these traits:
Out[]=
Trait
Orientation
Imagination
+
Excitement-seeking
-
Morality
+
Self-consciousness
+
Vulnerability
+

Running the Big Five Test

Importing the Test

We have our list of personas. Now we must import the Big Five test. We pull a set of 300 questions from the Center for Open Science’s Big Five scoring guide [2]. Each “question” in the dataset is a description of a trait, action, habit, or emotional situation. Test-takers respond with the extent to which that description applies to themselves and their lives. Responses are given on a range-5 Likert scale with the response options “Very accurate,” “Moderately accurate,”, Neither accurate nor inaccurate,” “Moderately inaccurate,” and “Very inaccurate.” Each question is assigned a “Full#” which indicates the order 1-300 in which the questions are asked, a “Facet” which indicates which specific trait is being targeted by the prompt, and a “Key” associated with that “Facet.” Each question also has a “Sign,” which flips from 1 to -1 based on the polarity of the question (i.e., questions for which agreeing with the prompt indicate a high likelihood for the measured trait have sign 1, and questions for which agreeing with the prompt indicate a high likelihood for the measured trait have sign -1). Note that, with 30 possible traits and 300 questions, each trait has 10 associated questions which measure it. The dataset of questions is reproduced below:
Out[]=
Full#
Sign
Key
Facet
Item
1
1
N1
Anxiety
Worry about things.
2
1
E1
Friendliness
Make friends easily.
3
1
O1
Imagination
Have a vivid imagination.
4
1
A1
Trust
Trust others.
5
1
C1
Self-Efficacy
Complete tasks successfully.
6
1
N2
Anger
Get angry easily.
7
1
E2
Gregariousness
Love large parties.
8
1
O2
Artistic Interests
Believe in the importance of art.
9
1
A2
Morality
Would never cheat on my taxes.
10
1
C2
Orderliness
Like order.
11
1
N3
Depression
Often feel blue.
12
1
E3
Assertiveness
Take charge.
13
1
O3
Emotionality
Experience my emotions intensely.
14
1
A3
Altruism
Make people feel welcome.
15
1
C3
Dutifulness
Try to follow the rules.
16
1
N4
Self-Consciousness
Am easily intimidated.
17
1
E4
Activity Level
Am always busy.
18
1
O4
Adventurousness
Prefer variety to routine.
19
1
A4
Cooperation
Am easy to satisfy.
20
1
C4
Achievement-Striving
Go straight for the goal.
rows 1–20 of 300

Prompting Personality and Test Questions

Now, we define a function that prompts an LLM to (1) adopt the persona, and (2) answer the 300-question Big Five test as that character:
ClearAll[llmBigFiveTest]​​(*Baselinetest:*)​​llmBigFiveTest[]:=LLMSynthesizeStringTemplate​​

StringRiffle[
]
,
StringRiffle[
]
,​​ LLMEvaluator-><|"Model"-><|"Service"->"OpenAI","Name"->"gpt-4o"|>|>​​​​(*Providedalistofpersonalitytraits:*)​​llmBigFiveTest[personalityPrompts_List]:=LLMSynthesizeStringTemplate​​
StringRiffle[personalityPrompts,"\n"],
StringRiffle[
]
,
StringRiffle[
]
,​​ LLMEvaluator-><|"Model"-><|"Service"->"OpenAI","Name"->"gpt-4o"|>|>
Let’s break this function down. First, note that the list of personality prompts is an optional argument. If not given, the function will prompt the model to take the Big Five test without prompting a personality, effectively testing the base model. This is useful, as the base model can be a helpful control with which to compare other runs of the test on LLMs with altered personalities. If the argument is given, the function uses StringRiffle to compile the prompts into one paragraph-length string. This string is the personality prompt, which is the first half of the full prompt.
The second half of the prompt gives the instructions for the LLM to answer the Big Five test questions. This part describes the nature of the test and instructs the LLM to return a very specific output format where each question is repeated and the answer to that question is printed after a dash. Each question/answer pair is printed on its own line. The LLM is also instructed to choose answers exclusively from the set of 5 Likert scale response options. Standardizing the output format like this makes it easier to systematically convert these answers to our vector format. The answer choices and questions are coded as variables in the StringTemplate. This way, the test pipeline could be adapted to give the LLM a different set of questions with a different answer scale (i.e. a test with binary agree/disagree answers).
Lastly, the function specifies the service provider and specific model to call for these tests. The model is specified as a safeguard against the service switching models in the middle of a test run over an entire dataset. Our results would be less useful if, for example, half of the personas were tested on GPT-4o and the other half on o3. For our experiment, we have specified that our tests will run on OpenAI’s GPT-4o, although this can be adapted based on limits to a user’s access to certain models.

Computing the Score and Results Vector

Our llmBigFiveTest function returns the output from the LLM personality test as a giant string in the format specified to the LLM in the prompt, but how can we actually analyze this and get some useful information? We will use a method for converting these answers to numerical answer and total score vectors. Because every question/answer pair has its own line in the string output, we split the string into a list using the newline character as the separator. This creates a 300 element list where each element is a Big Five question and the answer the LLM gave to that question. Then, we trim the string to only include the characters after the dash character (i.e. the trimmed string only includes the LLM’s answer to the question and not the question itself. And, lastly, we convert each of the five answer choices into a number on a 1-5 scale where “Very inaccurate” is 1 and “Very accurate” is 5. This leaves us with our answer vector, where each answer given by the LLM is quantified as a number. This process is encapsulated in the code below:
ClearAll[answerScale]​​answerScale={"Very inaccurate"->1,"Moderately inaccurate"->2,"Moderate inaccurate"->2,"Neither accurate nor inaccurate"->3,"Moderately accurate"->4,"Moderate accurate"->4,"Very accurate"->5};​​​​ClearAll[answersToAnswerVector]​​answersToAnswerVector[answers_String]:=Flatten[ReplaceAll[answerScale][Map[StringCases[" - "~~x:RegularExpression[".*"]:>x],StringTrim[StringSplit[answers,"\n"]]]]]
We can use this answer vector to also create a score vector. We will simply add together the scores of each question pertaining to each trait. Since each trait has 10 questions that measure it and each answer has an associated value in the range [1, 5], each trait can be measured by adding the scores of each of its questions together to get a score out of 50 representing how much the LLM’s answers indicate that trait. Note that the “Sign” category from the questions dataset is relevant for this calculation. For questions with a sign of 1, answering with “Very accurate” is equivalent to a score of 5, while answering “Very accurate” to a question with a sign of -1 is equivalent to a score of 6 - 5 = 1. This process is encapsulated in the function below:
computeBigFiveTestResults[(*Thevectorof1-5answers*)answers_List]:=Join[questions,Dataset[Map[<|"Answer"->#|>&,answers]],2][​​ All,<|#,"Score"->If[#Sign>0,#Answer,6-#Answer]|>&][GroupBy["Key"],KeyDrop["Key"]][All,All,"Score"][All,Total]

Putting it All Together

Now, let’s apply all these functions to our dataset of personas! As a reminder, we want to record every persona’s name, personality prompts, the answers they gave to the Big Five test (stored as a vector of numbers) and the test score summary (stored as an association of each individual trait’s key and its associated sum score). For the sake of easier interpretation, let’s also add a category that encodes a concise summary of the traits that make up each persona’s personality. We can pull trait and orientation information from our dataset of prompts with the following function:
ClearAll[personalityPromptComponentSummary]​​personalityPromptComponentSummary[facetDescriptions_List]:=personalityPromptComponents[​​ Select[MemberQ[facetDescriptions,#Prompt]&]][KeyDrop[{"Prompt"}]]
We can convert this information to a string and use some string manipulation to get a concise summary of the traits. We’ll add a trait summary like this to every persona under the category “TraitSummary”. Now, we can finally define our test results as a new dataset that stores all of this information:
testResults=personaPersonalityPrompts[All,<|​​"AnswersVector"->answersToAnswerVector[llmBigFiveTest[#]],​​"TraitSummary"->StringRiffle[Map[StringJoin@*Reverse,Values[Normal[personalityPromptComponentSummary[#]]]],"\n"],​​"PersonalityPrompts"->#|>&][All,<|​​"TestScores"->Normal[computeBigFiveTestResults[#AnswersVector]],​​#|>&];
Note that running the code above runs the llmBigFiveTest function (which involves very long prompts and long computation times for each individual call to the LLM API) once for every single persona. For the sake of time and consistency with the results from our own analysis, we will reproduce the output of running this code on our set of 30 personalities from earlier:
Out[]=
TestScores
AnswersVector
TraitSummary
PersonalityPrompts
N1
E1
O1
A1
C1
N2
E2
O2
A2
C2
›
Nubia
50
15
50
36
27
32
10
46
49
34
5
1
5
4
3
2
1
5
5
4
+Imagination
-Excitement-seeking
+Morality ⋱
You are sensitive about what others think of them. Your concern about rejection and ridicule causes you to feel shy and uncomfortable around others. You are easily embarrassed and often feel ashamed. Your fears that others will criticize or make fun of you are exaggerated and unrealistic, but your awkwardness and discomfort may
make these fears a self-fulfilling prophecy.
5
1
5
2
5
5
1
2
4
1
You believe that the real world is often too plain and ordinary. You use fantasy as a way of creating a richer, more interesting world. You are more oriented to fantasy than facts.
3
1
4
5
2
5
2
3
5
4
You experience panic, confusion, and helplessness when under pressure or stress. You do not feel poised, confident, or clear-thinking when stressed.
5
1
5
4
2
2
1
5
4
4
You see no need for pretense or manipulation when dealing with others and are therefore candid, frank, and sincere. You do not believe that a certain amount of deception in social relationships is necessary. People find it relatively easy to relate to you, as you are straightforward.
5
1
5
4
5
5
1
2
5
4
You are content with low levels of stimulation and rarely feel bored. You prefer calm, quiet environments over bright lights and busy settings. You tend to avoid risks and thrills, and you are easily overwhelmed by noise or commotion.
5
1
4
5
2
5
2
3
5
5
300 total ›
Shaliek
50
11
35
20
15
45
10
37
40
28
5
1
3
2
2
4
1
3
4
4
-Self-discipline
+Cautiousness
-Friendliness ⋱
You have the disposition to think through possibilities before acting. You take your time when making decisions. You rarely say or do first thing that comes to mind without deliberating alternatives and the probable consequences of those alternatives.
5
2
5
2
4
5
1
2
2
1
You often struggle to persist with difficult or unpleasant tasks until completion. You may hesitate to begin tasks and can be easily distracted. You procrastinate and show poor follow-through, often failing to even complete tasks you want very much to complete.
3
1
4
5
1
5
1
3
4
4
You tend to find the company of others draining rather than stimulating. You often feel overwhelmed by crowds and prefer to avoid large gatherings. You usually value privacy and alone time more than social interaction.
5
1
3
2
1
4
1
4
4
4
The fight or flight system of your brain is easily and often engaged. Therefore, you often feel like something dangerous is about to happen. You may be afraid of specific situations or be just generally fearful. You feel tense, jittery, and nervous.
4
1
4
3
2
5
1
2
5
1
You tend to be reserved and cautious around others. You find it difficult to make friends quickly, and you may not openly show positive feelings. You often keep to yourself and may be perceived as distant or reserved.
4
1
3
4
2
5
1
3
4
4
300 total ›
Ryesha
19
40
16
48
50
19
28
24
47
47
2
4
2
5
5
2
2
2
5
5
-Imagination
-Liberalism
+Cautiousness ⋱
You have the disposition to think through possibilities before acting. You take your time when making decisions. You rarely say or do first thing that comes to mind without deliberating alternatives and the probable consequences of those alternatives.
2
5
2
4
5
1
4
2
3
5
You have a strong respect for authority, tradition, and established values. You are inclined to follow rules, disapprove of law-breakers, and prefer clarity, order, and structure over ambiguity or chaos. You prefer the security and stability brought by conformity to tradition.
2
2
5
3
4
2
4
1
4
5
You often assert that you are better than others and tend to compare yourself favorably. In some cases, this attitude may stem from high self-confidence or a strong sense of self-esteem.
2
4
1
5
5
2
2
4
5
5
You assume that most people are fair, honest, and have good intentions. You do not usually see others as selfish, devious, or potentially dangerous.
1
5
4
4
5
2
4
2
2
5
You believe the real world is rich and engaging as it is. You prefer to focus on facts and reality rather than escape into fantasy. You are more oriented to facts than fantasy.
2
1
4
2
5
2
4
1
4
5
300 total ›
rows 1–3 of 30

Plotting and Analyzing Our Personality Vectors

Score Charts

Method

We can plot the test scores for any given personality as stacked bar chart with 5 bars, each representing one of the OCEAN traits and composed of that trait’s subtraits. For example, below is the bar chart for the scores of the GPT-4o base model with no persona prompt:
Out[]=
In general, we can plot any persona’s test results with the following code:
ClearAll[mainCategoryRules,subcategoryRules,bigFiveTestResultsPlot]​​mainCategoryRules=
;​​subcategoryRules=
;​​bigFiveTestResultsPlot[results_Dataset]:=With​​ {data=Dataset[GroupBy[Nest[Normal,results,2],StringTake[First[#],1]&]][KeyTake[{"O","C","E","A","N"}]]},​​ BarChart​​ Map[KeyValueMap[Labeled[#2,#1,Center]&,Association[#]]&,Normal[data]/.subcategoryRules],​​ PlotLabel->Text[Style["Summary of Big Five test results",14]],​​ ChartLabels->{Normal[Keys[data]]/.mainCategoryRules,None},​​ ChartLayout->"Stacked",ChartStyle->
[◼]
ColorBrewerData
["Spectral",6],​​ AxesLabel->"Score",ImageSize->650

Analysis

What can we learn from these plots? Mainly, if we see that the bar correctly displays high scores for traits that were prompted to be high and low scores for traits that were prompted to be low, that gives us a good indication that our personality prompts are targeting the correct personality traits and that our score vectors are correctly encoding the various changes in personalities. Let’s look at a few examples to see if this is true.
Out[]=
According to the bar chart, Daylynn expectedly scores low in imagination and high in self-discipline, activity level, excitement-seeking, and depression.
Out[]=
This chart shows that Azaryia also scores in accordance with their traits—low in dutifulness, self-discipline, and cooperation, and high in adventurousness and sympathy.
Out[]=
And, following the trend, Koto scores low in emotionality, intellect, and gregariousness, and high in cooperation and depression.
So, it seems that our vectors and tests are performing quite well at capturing and analyzing personality traits in LLMs. Where else can we go with these charts? One thing to look at would be comparing the similarity of the charts across personas of varying degrees of similarity. One such interesting example of this is the comparison between Anaely and Johnovan:
Out[]=
As we can see, Anaely and Johnovan share 4 of the exact same traits—[-Orderliness], [-Self-discipline], [-Morality], and [+Modesty]. It’s interesting to note in this case that despite this fact, the shape of their charts are fairly dissimilar. The same overall structure is there—which is to say that both personalities are high in Openness to Experience and Neuroticism while the other three categories are relatively similar levels of lower than those two; however, there are also obvious differences. For example, Anaely is much higher in excitement-seeking, intellect and adventurousness than Johnovan and Johnovan is significantly higher in cautiousness, achievement-striving and orderliness than Anaely.
This suggests an enticing angle for future study on the topic. Clearly, targeting one trait with a prompt can have residual effects on other traits throughout the personality. If this weren’t the case, we would expect Anaely and Johnovan to be much more similar—perhaps only different at all in the two traits they don’t share. What we actually see, however, is that even with 80% prompt similarity, targeting different traits in the prompt—and the interactions of those traits with the rest of the personality—has a sort-of viral effect. Perhaps, with a large enough systematic study, we could create a web of dependency between the traits. That way, we could better understand how tuning one trait may have downstream effects on the rest of the personality. As of now, this study is only big enough to show that these dependencies exist, as we simply do not yet have enough examples to systematically examine them.
If interested in further evaluating and comparing these personality charts, the following code will generate the bar charts for every persona in the dataset:
Map[Labeled[bigFiveTestResultsPlot[Dataset[First[Last[#]]]],Text[Column[{Row[{Style["Name: ",Bold],First[#]}],Style["\nTraits:",Bold],Last[Last[#]]},Spacings->1]],Left]&,Thread[{Normal[Keys[testResults]],Lookup[Values[Normal[testResults]],{"TestScores","TraitSummary"}]}]]//Column

Point Plots and Community Clustering

Method

We can also map these vectors onto a vector space and see if similar vectors are physically close to each other in that space. Why would we want to do this? Well, it would be interesting to know if the points representing the personalities cluster, and if so, do they cluster based on certain shared traits or based on their total score? To investigate this, let’s make a community graph that identifies networks of personalities which are deeply internally connected to each other. To make it even more informative, we’ll also add connection arrows between every node and its 3 closest neighbors to make sure we can see when, how, and how often the communities connect with each other. Finally, we can compare the results using two different metrics of “distance” in the multidimensional space of the test score vectors. To that end, we can generate one community/nearest neighbor graph which measures closeness using cosine distance:
Out[]=
and one graph which uses Euclidean distance:
Out[]=
Note that you can view the personality bar chart of any of the persona points by hovering over it on the graph.

Analysis

Analyzing these graphs, we first see that the communities of personas are the same across the two different distance metrics, with the sole exception of Zamera, who ends up in the yellow community in the cosine distance graph and in the purple community in the Euclidean distance graph. Therefore, the analysis of either one will be more-or-less the same.
Perhaps the most interesting observation we make in these plots is that Anaely and Johnovan (our pair from earlier with 4 exactly matching traits) are close neighbors, but notably are not in the same community. This suggests very strongly that the personas are clustering based on the similarity of their personalities as a whole rather than based off of the similarities in individual traits.
The communities could be capturing a variety of similarities. As a case study, lets look closely at the green community, the smallest of the five:
Out[]=
We could say that he green community is a cluster of 4 personas who have relatively high scores in almost every category. We notice that all 4 are particularly high (near maximum, even) to the maximum possible score for extraversion. All green personas except Kipley are also similarly high in conscientiousness, and all personas except for Daylynn are similarly high in openness to experience. Although, both of these exceptions maintain average or above average scores in their respective exceptional categories. Finally, all 4 are, at least relative to the other categories (or even objectively) low in the agreeableness category. This small-scale example shows that our score vectors are capturing differences between personas at the personality level and not the trait level. This is an interesting result, as it shows that LLMs, much like humans, do not make decisions based off of singular factors. Rather, LLM decisions appear to be based on an intricate interplay between many different facets of their personalities.

Future Work

As mentioned in the main body, results indicate that trying to affect a specific trait of an LLMs personality has downstream effects on other traits that were not explicitly targeted. A future study might conduct a large-scale investigation into this web of personality traits. Identifying exactly which traits have the strongest effects on which other traits could be a very important step toward being able to smartly engineer an LLM’s personality with high degrees of accuracy and success.
Secondly, for resource reasons, this study only conducted tests using OpenAI’s GPT-4o model. It stands to reason that different training would yield different personalities in the base model and perhaps even a different “web of traits” to investigate. Future work in the area with a higher resource capability may investigate the idea of LLM-psychological testing across multiple different models and model providers.
Another interesting angle for future research would be to see how well LLMs with a given personality reflect the behavior of humans with those personalities. For example, imagine a human who takes the same 300-question Big Five test and scores very similarly on our scale as Kipley (pictured above). If given the same set of situations, would Kipley and that human make the same decisions or produce the same behavior?
Lastly, I want to indicate this study’s place within a larger vision. LLM behavior is remarkably hard to predict given the blackbox nature of the models. However, as LLMs and AI in general become more and more powerful and mainstream, pressure continues to build to implement AI-based solutions across all levels of business and society. Eventually, AI agents could be responsible for making many of the more mundane day-to-day decisions that humans currently make. This project interrogates AI personalities with the larger vision of using top down descriptions of an LLM’s behavior (that is, a qualitative assessments outside the realm of weights and attention) as part of an equation they may let us predict with some accuracy what the behavior of an LLM would be given some scenario (ethical dilemma, buy/sell decision, answering yes/no, etc.). If this idea bears fruit, we could be on our way to developing a systematic chain of prompting and context engineering that would allow us to produce AI decision-makers who reliably make the decisions in precisely the way we need them to for any task—all without having to worry about the insurmountably cryptic inner brains of these models.

Concluding Remarks

This research has proven that our Big-Five-based personality vector successfully captures certain elements of LLM personalities. We also show that, in the vector space, personalities that score similarly on the Big Five test cluster together. This has potentially exciting implications for the possibility of a predictive model based on LLM-psychological analysis, but a significant amount of work still needs to be done in order to develop a promising prediction system. In conclusion, the fields of LLM psychology and AI agent personality engineering are young, but show great potential in helping us to better understand how LLMs truly work and how we can better use them to serve our business and societal needs.

Acknowledgments

I owe a huge debt of gratitude to all staff involved in the Wolfram Summer School. To make a program with so many constantly moving parts run so seamlessly takes a tremendous amount of attention and effort, and the great experiences I have had as part of this program would not be possible without them!
​
I want to extend special thanks to my mentor, Philéas Dazeley-Gaist, without whom this project would have never even gotten off the ground. The extent to which their guidance on the direction and aid with technical implementation of this project enhanced its value and quality cannot be overstated. Although not my official advisor, James Wiles has also earned my gratitude on this front.
​
I also want to voice my appreciation for all of my peers at Wolfram Summer School for cultivating an exciting collaborative environment that challenged and stimulated me constantly throughout these three weeks. To all those who shared their ideas and opinions about my project and the world at large, I sincerely thank you for using your time to share your brilliant ideas.
​
Lastly, I must thank Stephen Wolfram for facilitating this opportunity for me. Taking his advice and attending the Summer School was the best decision I could have made for myself, and I deeply appreciate all the wisdom he has provided before and during my time at the Summer School.

References

1
.
Johnson, J. “IPIP-NEO Report,” https://drj.virtualave.net/IPIP/ipipneo6.cgi.
2
.
“IPIP-NEO-300 scoring tool_2.xlsx,” Center for Open Science, https://osf.io/ycvdk/files/osfstorage.

Cite This Notebook

“Assessing LLM Psychology With Personality Vectors”
by Luke Weinbach
Wolfram Community, 9 July 2025
https://community.wolfram.com/groups/-/m/t/3497920