Web Scraping and Facial Expression Analysis
10
|How to Scrape the Web for Images
Introduction
We’ve used the Import command in Wolfram Language in previous chapters to scrape the web for textual data. In this section, we can use string processing to scrape the web in a much more targeted manner. This type of analysis gives us a collection of images, so we can analyze them all at once.
Building a URL for Web Scraping
Building a URL for Web Scraping
Strings in Wolfram Language are text (words or phrases), and there are many built-in commands to work with strings of text. One of these is StringJoin, which takes several string of text and combines them into one single block of text. StringReplace can also be used to replace spaces with the plus symbol, which is the convention Google uses for its searches. We can use this same code to analyze several CEOs by defining the symbols ceo and company separately. That makes the code immediately reusable:
ceo="Margaret Keane";company="Synchrony";
First, let’s build one block of text to pass to Google to search for images with the desired formatting:
In[]:=
StringJoin["https://www.google.com/search?q=",StringReplace[ceo," "->"+"],"+CEO+",company,"&source=lnms&tbm=isch"]
Out[]=
https://www.google.com/search?q=Margaret+Keane+CEO+Synchrony&source=lnms&tbm=isch
Once the URL is built to search for particular images, we can use this same series of commands within Import, which returns a list of images:
In[]:=
allPictures=Import[StringJoin["https://www.google.com/search?q=",StringReplace[ceo," "->"+"],"+CEO+",company,"&source=lnms&tbm=isch"],"Images"]
Out[]=
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Summary
Summary
In previous chapters, we searched for specific images, copied them manually, then pasted them into the Mathematica notebook for analysis. This approach automates the process and gives us a large number of images without those manual steps.