Lab 10: Working with Data

NetID:
Link to published notebook:

A Data Science Workflow

A data science project needs a flexible, modular, iterative and multiparadigm workflow.

Setting up Questions


Data Wrangling


Exploratory Data Analysis


Analyzing Data: Machine Learning can Help


Communicating Results


Part 1: EDA or Exploratory Data Analysis

In this lab, we will have a look at a couple of simple examples of working with data.

Background


Code for Exploratory Data Analysis

Load the Data


Problem 1: How many rows and columns are there?


Problem 2: Calculate descriptive statistics


Problem 3: Visual exploration using scatter plots


Problem 4: Visual exploration using histograms


Problem 5: Non-graphical exploration - find the correlation of the features.


Part 2: EDA of a WebPage of Your Choice

In this section we will perform EDA on a webpage of your choice to see how we can quickly get quantitative and visual information about this page.

Explore a webpage of your choice

Set the URL for the page you want to explore:
(*url="https://www.marinebio.net/marinescience/06future/abintro.htm";*)
url=
Import the text from the webpage:
In[]:=
pageText=Import[url]

Problem 6: What is the page talking about?


Problem 7: Analyze the text


Problem 8: What sort of pictures are found on the page?


Get WikipediaData on a topic related to your webpage for comparison

The following function gets the plain text from the Wikipedia page on a particular topic. Provide the topic of your article in the space between the quotes:
(*text=WikipediaData["Abalone"];*)
text=WikipediaData[" "];

Problem 9: Create a word cloud


Problem 10: Find a specific type of entity used often on this page.


Part 3: Yet Another Survey

Thanks for being a fantastic class.
Please fill out this short survey for me - so I can improve this course: https://wolfr.am/ECE101FA23

Submitting your work

1
.
Publish your notebook
1
.
1
.
From the cloud notebook, click on “Publish” at the top right corner.
1
.
2
.
From the desktop notebook, use the menu option File -> Publish to Cloud
2
.
Copy the published link
3
.
Add it to the top of the notebook, below your netID
4
.
Print to PDF
5
.
Upload to Gradescope
6
.
Just to be sure, maybe ping your TA Sattwik on Slack that you have submitted.