PER MØLDRUP-DALUM

Home • Articles • Reading • CV • About

Who does the talking at The Office?

The Office as data

These days I’m into Mathematica, playing with the Wolfram Language again. So, when I came upon this website, I had to fool around a little. The website presents the dialouge from The Office TV series as data and even as a shake and bake R package. But, as I wanted to play in the Wolfram Language, I needed to get the data on a more general form: CSV
The following R code will create exactly that:
install.packages("schrute")
library(schrute)
library(readr)
write_csv(theoffice, "~/Downloads/theoffice.csv")
Get the data into Mathematica
theoffice=SemanticImport["~/Downloads/theoffice.csv"];
In[]:=
theoffice
In[]:=
index
season
episode
episode_name
character
text
text_w_direction
1
1
1
Pilot
Michael
All right Jim. Your quarterlies look very good. How are things at the library?
All right Jim. Your quarterlies look very good. How are things at the library?
358
1
1
Pilot
Jim
Oh, I told you. I couldn't close it. So...
Oh, I told you. I couldn't close it. So...
715
1
1
Pilot
Michael
So you've come to the master for guidance? Is this what you're saying, grasshopper?
So you've come to the master for guidance? Is this what you're saying, grasshopper?
1072
1
1
Pilot
Jim
Actually, you called me in here, but yeah.
Actually, you called me in here, but yeah.
1429
1
1
Pilot
Michael
All right. Well, let me show you how it's done.
All right. Well, let me show you how it's done.
1786
1
1
Pilot
Michael
So that's the way it's done.
[on the phone] Yes, I'd like to speak to your office manager, please. Yes, hello ⋱
2143
1
1
Pilot
Michael
I've, uh, I've been at Dunder Mifflin for 12 years, the last four as Regional Ma ⋱
I've, uh, I've been at Dunder Mifflin for 12 years, the last four as Regional Ma ⋱
2500
1
1
Pilot
Pam
Well. I don't know.
Well. I don't know.
2857
1
1
Pilot
Michael
If you think she's cute now, you should have seen her a couple of years ago. [growls]
If you think she's cute now, you should have seen her a couple of years ago. [growls]
3214
1
1
Pilot
Pam
What?
What?
3571
1
1
Pilot
Michael
Any messages?
Any messages?
3928
1
1
Pilot
Pam
Uh, yeah. Just a fax.
Uh, yeah. Just a fax.
4285
1
1
Pilot
Michael
Oh! Pam, this is from Corporate. How many times have I told you? There's a speci ⋱
Oh! Pam, this is from Corporate. How many times have I told you? There's a speci ⋱
4642
1
1
Pilot
Pam
You haven't told me.
You haven't told me.
4999
1
1
Pilot
Michael
It's called the wastepaper basket! Look at that! Look at that face.
It's called the wastepaper basket! Look at that! Look at that face.
5356
1
1
Pilot
Michael
People say I am the best boss. They go, ""God we've never worked in a place like ⋱
People say I am the best boss. They go, ""God we've never worked in a place like ⋱
5713
1
1
Pilot
Dwight
I have no gifts for you. Pa rum pump um pum [Imitates heavy drumming]
[singing] Shall I play for you? Pa rum pump um pum [Imitates heavy drumming] I h ⋱
6070
1
1
Pilot
Jim
My job is to speak to clients on the phone about... uh, quantities and type of c ⋱
My job is to speak to clients on the phone about... uh, quantities and type of c ⋱
6427
1
1
Pilot
Michael
Whassup!
Whassup!
6784
1
1
Pilot
Jim
Whassup! I still love that after seven years.
Whassup! I still love that after seven years.
showing 1–20 of 55130
Out[]=

Count the spoken words

Add a data column counting all the words spoken and a column for ordering the data table. This latter approach, which seem a bit hackey, is maybe due to my lack of knowledge and and to learn to do the right thing.
theofficeWC=theoffice[​​All,​​<|#,"wordsSpoken"StringCount[#text,RegularExpression["\\w+"]]|>&​​][​​GroupBy[#,KeyTake[{"season","character"}]KeyTake["wordsSpoken"],Total]&​​][Normal][​​All,Apply[Join]][​​All,<|#,"order"#season*100000+#wordsSpoken|>&​​][​​SortBy["order"]​​]
In[]:=
season
character
wordsSpoken
order
1
Michel
1
100001
1
Teammates
1
100001
1
Warehouse
7
100007
1
Kelly
11
100011
1
Madge
14
100014
1
Everybody
19
100019
1
Todd
26
100026
1
Travel
26
100026
1
Worker
30
100030
1
Man
40
100040
1
Lonny
42
100042
1
Toby
58
100058
1
Phyllis
81
100081
1
Darryl
105
100105
1
Meredith
126
100126
1
Angela
133
100133
1
Stanley
156
100156
1
Kevin
158
100158
1
Ryan
219
100219
1
Roy
244
100244
showing 1–20 of 914
Out[]=
So, how may words in the first season?
BarChart[theofficeWC[Select[#season1&]][All,{"wordsSpoken"}]]
In[]:=
Out[]=

Reducing the character set

To make a fun visualization, I needed a way to filter the 500 or so different entities that utters at least one word. I myself like The Office, but I have not watched nearly enough to make any qualitative selection of roles or characters. Fortunately, I have a daughter that have watched everything.
She gave me a list of the five most important roles:
importantCharacters={"Dwight","Jim","Michael","Pam","Andy"};
In[]:=
I can now limit the data set to these five characters
theofficeWC[Select[MemberQ[importantCharacters,#character]&]]
In[]:=
season
character
wordsSpoken
order
1
Pam
1251
101251
1
Dwight
2215
102215
1
Jim
2429
102429
1
Michael
8875
108875
2
Pam
5622
205622
2
Jim
7291
207291
2
Dwight
8194
208194
2
Michael
26271
226271
3
Andy
3679
303679
3
Pam
5566
305566
3
Jim
5940
305940
3
Dwight
8369
308369
3
Michael
24239
324239
4
Andy
2335
402335
4
Pam
3820
403820
4
Dwight
5140
405140
4
Jim
5574
405574
4
Michael
20496
420496
5
Andy
5249
505249
5
Pam
6362
506362
showing 1–20 of 42
Out[]=

Visualisation

I can now also write a function that, given a season, gives me the number of words spoken by these five characters. The funtion can also give me either the numbers or the characters that actually speaks in the given season.
season[i_,f_]:=theofficeWC[Select[MemberQ[importantCharacters,#character]&]][​​Select[#seasoni&]​​][SortBy["character"]][All,f]//Normal
In[]:=
E.g.
season[3,"character"]
In[]:=
{Andy,Dwight,Jim,Michael,Pam}
Out[]=
season[3,"wordsSpoken"]
In[]:=
{3679,8369,5940,24239,5566}
Out[]=
season[8,"wordsSpoken"]
In[]:=
{13536,11564,8332,4206}
Out[]=
season[8,"character"]
In[]:=
{Andy,Dwight,Jim,Pam}
Out[]=
Next up is a function, that shows this data
And as the finale, let’s create a animated GIF
This can be saved for sharing

What is next?

  • Sentiment analysis of the characters. Is e.g. Michael’s sentiment evolving through the 9 seasons?
  • ◼
  • ...
  • ◼