WOLFRAM NOTEBOOK

Social Media Analytics

Resources

Introduction

Examining the information available through social media can tell you a lot about your online presence. This is a quick exploration of some social media analytics, with examples using a personal Twitter account.

Getting Data

In order to do any meaningful analysis, we’ll need to pull data from a few different sources. The first step is to connect to a social network—in this case, Twitter. We’ll also pull in some geographic data from the Wolfram Knowledgebase, as well as some data collected through an email survey.
Import a follower network from Twitter:
In[1]:=
network=ServiceExecute["Twitter","FollowerNetwork"]
Out[1]=
Grab entities for the 30 nearest cities:
In[2]:=
cities=GeoEntities
Santa Clara County, California, United States
ADMINISTRATIVE DIVISION
,"City"
Out[2]=
Cambrian Park
,
Fruitdale
,
Burbank
,
San Martin
,
Los Altos Hills
,
Sunol-Midtown
,
San Jose
,
Morgan Hill
,
Milpitas
,
Lexington Hills
,
Sunnyvale
,
Gilroy
,
East Foothills
,
Loyola
,
Santa Clara
,
Los Altos
,
Alum Rock
,
Mountain View
,
Cupertino
,
Seven Trees
,
Palo Alto
,
Saratoga
,
Monte Sereno
,
Buena Vista
,
Campbell
,
Los Gatos
,
Stanford
Import local data from the survey:
In[3]:=
data=SemanticImport["surveydata.csv"]
Out[3]=
FollowerID
Location
Age
Transporation
Education Level
Industry
137017760
Mountain View
32.6763
Walk
Two-Year Degree
Technology
128882819
Campbell
33.4117
Bicycle
Two-Year Degree
Technology
32093372
Sunnyvale
43.2417
Bicycle
Bachelor's
Manufacturing
14132025
Morgan Hill
38.6501
Other
Two-Year Degree
Sales
77828589
Monte Sereno
33.1648
Bicycle
Bachelor's
Entertainment
showing 15 of 800
Data types are automatically interpreted during import, so the locations in this dataset match the previous knowledgebase entities. We can easily cross-reference this data for quick analysis.
A map showing how many followers live in nearby cities:
In[4]:=
GeoBubbleChartdata@Select[MemberQ[cities,#Location&]//CountsBy[#Location&]],
Options
Out[4]=

Age: How Old Are Followers in the Network?

Looking at the age spread of a network can be a good starting point for understanding its makeup. The goal here is to compute an approximate numerical distribution of ages. Since the normal distribution has convenient analytic properties, we’ll try that first.
Find the closest normal distribution fit:
In[5]:=
FindDistribution[data[All,"Age"],TargetFunctions{NormalDistribution},MaxItems1]
Out[5]=
NormalDistribution[38.9873,5.49519]
Plot the actual distribution (red) compared with the estimate (yellow):
In[6]:=
Histogram{ages,%},Length[ages],
Options
Out[6]=
The normal distribution doesn’t quite match; the original data looks heavier on the left side than on the right. For this asymmetric shape, the skew normal distribution might be a better approximation. It has an additional parameter α that measures the skewness (slant) of the distribution.
Compare the formulas for the two distributions:
Out[7]=
-
2
(x-μ)
2
2
σ
2π
σ
-
2
(x-μ)
2
2
σ
erfc-
α(x-μ)
2
σ
2π
σ
Normal
Skew Normal
Find starting parameters for the skew normal distribution:
In[8]:=
FindDistributionParameters[ages,SkewNormalDistribution[μ,σ,α]]
Out[8]=
{μ35.1378,σ9.66348,α2.72919}
Adjust the sliders to verify the fit visually:
Out[9]=
μ
25
σ
8
α
-4
Wolfram Cloud

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


I understand and wish to continue anyway »

You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.