WOLFRAM NOTEBOOK

Lab 8: Neural Networks and Authentication

NetID:

Link to published notebook:

In this lab, we will have have a look at generating text using Neural Networks. We will also look at hash functions, symmetric & asymmetric key cryptography.

Part 1: Neural Networks to Generate Text

The GPT2 Transformer is a neural net model trained on WebText Data (preliminary version of the OpenWebText dataset, consisting of 40 GB of text scraped from webpages that have been curated by humans ) to generate text in English and represent text as a sequence of vectors.

A version of this neural net model is available in the Wolfram Neural Net Repository at https://resources.wolframcloud.com/NeuralNetRepository/resources/GPT2-Transformer-Trained-on-WebText-Data/

This model consists of a family of individual neural nets, each identified by a specific parameter combination. Inspect the available parameters:

In[]:=

NetModel["GPT2 Transformer Trained on WebText Data","ParametersInformation"]

Retrieve the language model by specifying the “Task” parameter:

In[]:=

lm=NetModel[{"GPT2 Transformer Trained on WebText Data","Task"->"LanguageModeling"}]

Out[]=

NetChain



Input port:	string
Output port:	class

Data not saved. Save now



What information is available about this particular model?

In[]:=

Information[lm]

What does the network graph look like?

In[]:=

Information[lm,"FullSummaryGraphic"]

How many layers does this neural network have?

In[]:=

Information[lm,"LayersCount"]

Problem 1

Follow the instructions below and generate three different samples of text to follow the prompt “Albert Einstein was a German-born theoretical physicist”.

Use the model itself to predict the next word in a given sequence:

In[]:=

lm["Albert Einstein was a German-born theoretical physicist"]

Obtain the top 10 probabilities to follow the prompt:

In[]:=

topProbs=lm["Albert Einstein was a German-born theoretical physicist",{"TopProbabilities",10}]

Here’s what happens if one repeatedly “applies the model” (say 10 times)—at each step adding the word that has the top probability (specified in this code as the “Decision” from the model):

In[]:=

NestList[StringJoin[#,lm[#,"Decision"]]&,"Albert Einstein was a German-born theoretical physicist who developed",10]//Column

The following function was created to help predict a number of words that can follow a text prompt, according to the “temperature” set for the model:

In[]:=

generateSample[input_String,numTokens_:10,temperature_:1]:=Nest[Function[StringJoin[#,lm[#,{"RandomSample","Temperature"->temperature}]]],input,numTokens];

To use the function generateSample, you must provide:
1) an input string which is the prompt
2) number of tokens (words) to generate
3) something called the “temperature”--a high temperature flattens the distribution from which tokens are sampled, increasing the probability of extracting less likely tokens.

Generate the next 20 words by using it on a piece of text:

In[]:=

generateSample["Albert Einstein was a German-born theoretical physicist who developed",20]

Generate the next 20 words by using it on a piece of text, at the specified temperature of 0.8:

In[]:=

generateSample["Albert Einstein was a German-born theoretical physicist who developed",20,.8]

There are a lot of possible “next words” to choose from (at temperature 0.8), though their probabilities fall off quite quickly (the straight line on this log-log plot corresponds to an n–1 “power-law” decay that’s very characteristic of the general statistics of language):

Out[]=

Sample 1

Generate 40 words at a temperature of 1.5:

In[]:=

generateSample["Albert Einstein was a German-born theoretical physicist",40,1.5]

Sample 2

Increase the temperature but remember very high temperature settings are equivalent to random sampling:

In[]:=

generateSample["Albert Einstein was a German-born theoretical physicist",40,10]

Sample 3

Decrease the temperature (Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”):

In[]:=

generateSample["Albert Einstein was a German-born theoretical physicist",40,0.01]

Problem 2

Play around with the “temperature” value and try to generate a reasonable segment of text (100 tokens in length) to follow the prompt “A long time ago in a galaxy far, far away ”. Comment on your choice for the value of temperature that worked well for you.

Answer

(*writeyourcodehere*)

Part 2: Hash Functions

Hash functions take an arbitrary long, but finite, input and produce a fixed-size output based on that input:

In[]:=

book=ExampleData[{"Text","AliceInWonderland"}];

Hash[book]

The result of applying a hash function to the input data is called a digest (hash value, hash code, hash checksum). This digest serves as a reproducible representation of the input that detects any accidental or intentional change. Importantly, a digest is also very compact:

In[]:=

ByteCount[book]

In[]:=

ByteCount[Hash[book]]

Hash functions are commonly used to authenticate data integrity, to verify that the data received is indeed the data that was sent, with no alterations, or that “copies” of data stored in different locations are in fact the same.

Problem 3

Alice has to send her fellow scientist Bob a novel secret formula their lab has developed, so she attaches a file to an email and provides its “digest” or “hash checksum”:

Dear Bob, please find attached the molecule plot. The hash checksum is 2654733945256440784.

In[]:=

molecule=

;

Calculate the hash of the 3D “molecule” above.

Answer

Problem 4

Is the calculated hash value same as what Alice sent? If not, what might be the reason?

Answer

Submitting your work

Publish your notebook

From the cloud notebook, click on “Publish” at the top right corner.

From the desktop notebook, use the menu option File -> Publish to Cloud

Copy the published link

Add it to the top of the notebook, below your netID

Print to PDF

Upload to Gradescope

Just to be sure, maybe ping your TA Sattwik on Slack that you have submitted.

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.

I understand and wish to continue anyway »