Who does the talking at The Office?
Who does the talking at The Office?
The Office as data
The Office as data
These days I’m into Mathematica, playing with the Wolfram Language again. So, when I came upon this website, I had to fool around a little. The website presents the dialouge from The Office TV series as data and even as a shake and bake R package. But, as I wanted to play in the Wolfram Language, I needed to get the data on a more general form: CSV
The following R code will create exactly that:
install.packages("schrute")
library(schrute)
library(readr)
write_csv(theoffice, "~/Downloads/theoffice.csv")
library(schrute)
library(readr)
write_csv(theoffice, "~/Downloads/theoffice.csv")
Get the data into Mathematica
theoffice=SemanticImport["~/Downloads/theoffice.csv"];
In[]:=
theoffice
In[]:=
Out[]=
Count the spoken words
Count the spoken words
Add a data column counting all the words spoken and a column for ordering the data table. This latter approach, which seem a bit hackey, is maybe due to my lack of knowledge and and to learn to do the right thing.
theofficeWC=theoffice[All,<|#,"wordsSpoken"StringCount[#text,RegularExpression["\\w+"]]|>&][GroupBy[#,KeyTake[{"season","character"}]KeyTake["wordsSpoken"],Total]&][Normal][All,Apply[Join]][All,<|#,"order"#season*100000+#wordsSpoken|>&][SortBy["order"]]
In[]:=
Out[]=
So, how may words in the first season?
BarChart[theofficeWC[Select[#season1&]][All,{"wordsSpoken"}]]
In[]:=
Out[]=
Reducing the character set
Reducing the character set
To make a fun visualization, I needed a way to filter the 500 or so different entities that utters at least one word. I myself like The Office, but I have not watched nearly enough to make any qualitative selection of roles or characters. Fortunately, I have a daughter that have watched everything.
She gave me a list of the five most important roles:
importantCharacters={"Dwight","Jim","Michael","Pam","Andy"};
In[]:=
I can now limit the data set to these five characters
theofficeWC[Select[MemberQ[importantCharacters,#character]&]]
In[]:=
Out[]=
Visualisation
Visualisation
I can now also write a function that, given a season, gives me the number of words spoken by these five characters. The funtion can also give me either the numbers or the characters that actually speaks in the given season.
season[i_,f_]:=theofficeWC[Select[MemberQ[importantCharacters,#character]&]][Select[#seasoni&]][SortBy["character"]][All,f]//Normal
In[]:=
E.g.
season[3,"character"]
In[]:=
{Andy,Dwight,Jim,Michael,Pam}
Out[]=
season[3,"wordsSpoken"]
In[]:=
{3679,8369,5940,24239,5566}
Out[]=
season[8,"wordsSpoken"]
In[]:=
{13536,11564,8332,4206}
Out[]=
season[8,"character"]
In[]:=
{Andy,Dwight,Jim,Pam}
Out[]=
Next up is a function, that shows this data
And as the finale, let’s create a animated GIF
This can be saved for sharing
What is next?
What is next?
Sentiment analysis of the characters. Is e.g. Michael’s sentiment evolving through the 9 seasons?
◼
...
◼