In this module, students will build a classifier that predicts the author of a text based on some training data. Appropriate for ages 12+. Allow 60 minutes to complete the module. Important note: This module should be led by an instructor with basic Wolfram Language knowledge. If you would like to learn the language, please try thisfree online introduction. If you would like a Computational Thinking Initiative ambassador or volunteer to help you run an adventure, please contact us.
Students will learn about machine learning capabilities by using the Classify function to detect the author of a text.
Computational Thinking Principles and Practices
Interpreting a problem or idea in a way that a computer can assist with it
Exploring entire categories all at once (i.e. look at all flags, all of Shakespeare’s sonnets, etc.)
Simulating things that are hard or impossible to do by performing real-world experiments
AP Computer Science Principles:
LO 3.1.1: find patterns and test hypotheses about digitally processed information to gain insight and knowledge
Encourage students to set their texts to variables for later use and to use a semicolon to hide the text in order keep their notebooks short and neat.
◼ What authors did you choose?
◼ Your favorite author isn’t listed in the repository yet? How could you get text from another resource? You can even submit new resources so that everyone can use them!
Check that each student has chosen and imported some texts.
“Now that we have our texts, we can separate them into training and test sets. Later we will use the Classify function to teach the machine which texts belong to which authors. Classify is an example of machine learning. In some ways, machine learning is like human learning—the computer can’t learn the correct answer without some examples. If we want the computer to learn which author is which, we have to give it many examples of text and tell it which author wrote those examples. We call this our ‘training set.’ After the computer has learned using those examples, we are ready to test it to see how good it is at identifying the right author. We call the pieces of text we use for testing our ‘test set.’”
Be sure that students do not have any overlap between their training and test sets.