Wolfram Research

BERT Trained on BookCorpus and English Wikipedia Data

Represent text as a sequence of vectors

Released in 2018, Bidirectional Encoder Representations from Transformers (BERT) is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right contexts in all layers. This model can be fine tuned with an additional output layer to create state-of-the art models for a wide range of tasks. It uses bidirectional self-attention, often referred to as a transformer encoder.

Number of models: 8

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["BERT Trained on BookCorpus and English Wikipedia Data"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:=
NetModel["BERT Trained on BookCorpus and English Wikipedia Data", \
"ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the parameters:

In[3]:=
NetModel[{"BERT Trained on BookCorpus and English Wikipedia Data", 
  "Type" -> "LargeUncased", "InputType" -> "ListOfStrings"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"BERT Trained on BookCorpus and English Wikipedia Data", 
  "Type" -> "BaseCased", 
  "InputType" -> "ListOfStrings"}, "UninitializedEvaluationNet"]
Out[4]=

Basic usage

Given a piece of text, the BERT net produces a sequence of feature vectors of size 768, which correspond to the sequence of input words or subwords:

In[5]:=
input = "Hello world! I am here";
embeddings = 
  NetModel["BERT Trained on BookCorpus and English Wikipedia Data"][
   input];

Obtain dimensions of the embeddings:

In[6]:=
Dimensions@embeddings
Out[6]=

Visualize the embeddings:

In[7]:=
MatrixPlot@embeddings
Out[7]=

Requirements

Wolfram Language 12.0 (April 2019) or above

Resource History

Reference