Web Scraping and Facial Expression Analysis
16
|
Sentiment Analysis Dashboard to Compare Data Sources
​
Introduction
Previous chapters showed the process of starting with a PDF or website, importing that text, calculating sentiment analysis and displaying the results in a nice grid format. This chapter will show how to take those building blocks and make a mouse-driven interface to compare multiple financial reports. Specifically, the dashboard will compare the quantity of positive and negative words for the following three financial reports:
https://corporate.mcdonalds.com/content/dam/sites/corp/nfl/pdf/Exhibit%2099.1.pdf
https://corporate.mcdonalds.com/content/dam/sites/corp/nfl/pdf/Q120%20Exhibit%2099.1.pdf
https://www.fool.com/earnings/call-transcripts/2022/10/27/apple-aapl-q4-2022-earnings-call-transcript

Building a Helper Function or User-Defined Function

This dashboard is really just a pipeline of the exact same calculations created in previous chapters, with a different URL at the start of the series of calculations. The following input cell groups several of these calculations from previous chapters so all the calculations happen at once:
In[]:=
fullText=Import["https://corporate.mcdonalds.com/content/dam/gwscorp/nfl/investor-relations-content/quarterly-results/Exhibit%2099.1.pdf","Plaintext"];​​wordList=ToLowerCase[TextWords[fullText]];​​posWordsLength=Length[Select[wordList,posTest]]​​negWordsLength=Length[Select[wordList,negTest]]
Out[]=
37
Out[]=
9
Using those results, a shortened version of the TextGrid function example from a previous chapter can be used to display the quantity of positive words and the quantity of negative words:
In[]:=
TextGrid[{​​{"Quantity of Positive Words:",posWordsLength},​​{"Quantity of Negative Words:",negWordsLength}​​},Frame->All]
Out[]=
Quantity of Positive Words:
37
Quantity of Negative Words:
9
For the sake of shortening the code that’s on the screen, a user-defined function (or helper function) can be created to give a series of calculations a short name. The name importAndTagWords is used in these next calculations, which takes one URL as input. Before pasting in the calculations above, the following is a simpler outline for a user-defined function. The notation of a colon followed by an equal sign is delayed assignment, which means the calculations in the user-defined function will be rerun based on any new URL that is passed to the user-defined function:
In[]:=
importAndTagWords[url_]:=​​((*oneormorecalculations;*)​​(*oneofmorecalculations;*)​​0​​)
The previous calculations can be copied and pasted into this user-defined function to make it useful for a dashboard. Each calculation is separated by a semicolon. The variables being defined are global variables for the sake of simplicity; other WolframU courses cover localizing variables and other useful programming conventions for larger-scale projects:
In[]:=
importAndTagWords[url_]:=​​​​(fullText=Import[url,"Plaintext"];​​wordList=ToLowerCase[TextWords[fullText]];​​​​posWordsLength=Length[Select[wordList,posTest]];​​negWordsLength=Length[Select[wordList,negTest]];​​​​TextGrid[{​​{"Quantity of Positive Words:",posWordsLength},​​{"Quantity of Negative Words:",negWordsLength}​​},Frame->All]​​​​)
Now that this series of five calculations is named, the user-defined function can be used with any URL to perform the same series of calculations on a web-based PDF or regular website. The following calculation uses the second URL from the introduction section instead of the first. Notice this second financial report has more negative words than positive words (the opposite result compared to the first financial document):
In[]:=
importAndTagWords["https://corporate.mcdonalds.com/content/dam/sites/corp/nfl/pdf/Q120%20Exhibit%2099.1.pdf"]
Out[]=
Quantity of Positive Words:
7
Quantity of Negative Words:
18

Building a Mouse-Driven Dashboard

Wolfram Language contains many functions to create mouse-driven animations or dashboards. A mouse-driven interface provides a nice format to share results in an intuitive format for others to use to inform decisions or visualize an idea or discovery.
The function Manipulate is likely the most popular function for creating mouse-driven animations or dashboards in Wolfram Language. In this case, the final dashboard should display a choice of three URLs. A list can be used to organize the URLs, with an arrow and a label to be displayed in the dashboard. The label is an image copied and pasted from a web browser (“Copy Image” after right-clicking on the image, then paste into the Wolfram Notebook):
In[]:=
urlList="https://corporate.mcdonalds.com/content/dam/sites/corp/nfl/pdf/Exhibit%2099.1.pdf"->
,"https://corporate.mcdonalds.com/content/dam/sites/corp/nfl/pdf/Q120%20Exhibit%2099.1.pdf"->
,"https://www.fool.com/earnings/call-transcripts/2022/10/27/apple-aapl-q4-2022-earnings-call-transcript/"->
;
The function Manipulate is an extremely flexible function. The Wolfram Documentation Center contains many additional examples to show different mouse-driven controls (sliders, drop-down menu or multiple controls) and use cases.
Following is a simple starting point to get a feel for how Manipulate works. The tabs in the output correspond to the three URLs stored in the list urlList. When the user of the dashboard clicks a different tab, the StringLength of the text for that particular URL is calculated and displayed. SaveDefinitions is an option to store previous definitions in the dashboard, which is useful for sharing just the dashboard with no additional code. So this simple dashboard is displaying the quantity of characters in the text of the URL:
In[]:=
Manipulate[StringLength[website],{website,urlList},SaveDefinitions->True]
Out[]=
​
website
81
Instead of calculating the StringLength of each URL, the user-defined function importAndTagWords can be included instead. When the user of the dashboard clicks a different tab, the user-defined function runs all five calculations above and returns the results of TextGrid based on that new URL. By clicking through the tabs, the reader of the dashboard can easily compare and contrast the quantity of positive and negative words for multiple data sources:
In[]:=
Manipulate[​​importAndTagWords[website],​​{website,urlList},SaveDefinitions->True]
Out[]=
​
website
Quantity of Positive Words:
37
Quantity of Negative Words:
9
It is also possible to share a dashboard and minimize the code completely. After creating a dashboard with Manipulate, the author can double-click the blue cell bracket corresponding to the output. That action will minimize the input to provide a cleaner look and feel. The following example shows the same dashboard with minimized input. The blue cell bracket with an arrow at the top can be double-clicked to show the input again:
In[]:=
Manipulate[​​importAndTagWords[website],​​{website,urlList},SaveDefinitions->True]
Out[]=
​
website
Quantity of Positive Words:
37
Quantity of Negative Words:
9

Summary

It is common to use Wolfram Language and Wolfram Notebooks to rapidly test data analysis ideas. But it’s equally common for an author of a data analysis project to create a dashboard to make the results easy for anyone to read. The code for the analysis can be included in a different section of a Wolfram Notebook, giving a nice balance between ease of use and the ability for others to validate or understand the methodology, if desired.
DownloadNotebook»