Wolfram Cloud Document

Web Scraping and Sentiment Analysis

Displaying Results in a Grid

Introduction

Previous examples have displayed final results in a grid, but this chapter will show a more involved example for displaying results with custom labels and more types of values to provide additional perspective.

Displaying Results in a Grid with Custom Labels

The previous chapters of this section defined several symbols that count the positive and negative words common to both an external dictionary and the text in an imported PDF, along with the predicted sentiment for each word in the PDF. As a first step, let’s make sure these definitions are intact and ready for use in a grid of results. We can simply evaluate the symbol names to confirm the stored values:

In[]:=

posWordsLengthnegWordsLengthmL

Out[]=

{{Positive,1464},{Neutral,532},{Negative,471},{Indeterminate,19}}

Since the sentiment analysis is already tallied and sorted, it can be used with TextGrid directly:

In[]:=

TextGrid[mL,Frame->All]

Out[]=

Positive	1464
Neutral	532
Negative	471
Indeterminate	19

When displaying the positive and negative words from the external dictionary, let’s include several results that provide more perspective compared to just the results above. First, let’s calculate the total quantity of words, omitting stop words like “a” or “of.” The symbol totalWords can store that value for use in the final grid of results:

In[]:=

totalWords=Length[DeleteStopwords[wordList]]

Out[]=

1850

Next, we can calculate the ratio of positive words to negative words through division. The result shows that Wolfram Language always displays an exact result, which is a fraction in this case. The N function returns a numerical approximation, which is likely the better format for our grid of results. N has an optional second argument where we can specify four digits:

In[]:=

posWordsLength/negWordsLength

Out[]=

In[]:=

ratio=N[posWordsLength/negWordsLength,4]

Out[]=

4.111

We also might want to calculate positive words minus negative words, divided by total words. This gives a score that demonstrates both the ratio of positive to negative words and the frequency of those words in a PDF document. N can be used again to provide a numeric approximation with four digits:

In[]:=

tone=N[(posWordsLength-negWordsLength)/totalWords,4]

Out[]=

0.01514

With the results above, we can create a TextGrid statement that includes strings of text in one column for explanation, then the values in the second column. Unlike other examples where the result was already a nested list with multiple sets of curly brackets, we need to create that nested list structure here to combine the various results with the various labels:

In[]:=

TextGrid[{{"Quantity of Positive Words:",posWordsLength},{"Quantity of Negative Words:",negWordsLength},{"Ratio of Positive Words to Negative Words:",ratio},{"Total Quantity of Words:",totalWords},{"Tone of Words:",tone}},Frame->All]

Out[]=

Quantity of Positive Words:	37
Quantity of Negative Words:	9
Ratio of Positive Words to Negative Words:	4.111
Total Quantity of Words:	1850
Tone of Words:	0.01514

Summary

Wolfram Language and Wolfram Notebooks are an excellent environment for creative data analysis to try new ideas. The available set of commands in Wolfram Language allows us to try new completely new and creative ideas very quickly. Unlike other programming languages that require more code for analysis, Wolfram Language reduces the risk when trying new ideas, since the time commitment can be far less than other software or languages.

DownloadNotebook»