"Unspelling", unique tetragrams, or Wolfram wordplay​
​by Bill Gosper
Depending on the exact definition of the word “word”, there are numerous four-letter sequences (tetragrams) that occur in only one English word. Given the tetragram, it is usually surprisingly difficult to figure out the word. An example that circulated recently was MEOW, which is already a word, but you somehow disallow it. Rich Schroeppel used to call this the game of HIPE, perhaps under the mistaken belief that worsHIPEd and worsHIPEr contain a double P. One that took me more than a
week was ACUR. I was similarly stuck on ABUR. The landlord’s kid looked it up and threatened to spoil it for me. And then, with mock cruelty, DID spoil it for me, correctly guessing that it was a locally prevalent tree that I’d never heard of, being new to California. Instantaneous conversion of annoyance to gratitude. These days, this never could happen, because we have tunABURgers!
Mathematica has a function named WordList:
In[]:=
Length[WordList[]]
Out[]=
39176
In[]:=
Length[WordList["KnownWords",IncludeInflectionsTrue]]
Out[]=
135364
You may find it amusing and instructive to find all the unique tetragrams known to Mathematica. Here is one approach
In[]:=
WordList[][[41;;43]]
Out[]=
{abhor,abhorrence,abhorrent}
These are Strings:
In[]:=
FullForm@%[[2]]
Out[]//FullForm=
"abhorrence"
Take four letters at a time:
In[]:=
Table[StringTake[%,{k,k+3}],{k,StringLength@%-3}]
Out[]=
{abho,bhor,horr,orre,rren,renc,ence}
Define a function to do this:
In[]:=
fours[s_String]:=Table[StringTake[s,{k,k+3}],{k,StringLength@s-3}]
In[]:=
fours@"foobar"
Out[]=
{foob,ooba,obar}
In[]:=
fours/@WordList[][[41;;43]]
Out[]=
{{abho,bhor},{abho,bhor,horr,orre,rren,renc,ence},{abho,bhor,horr,orre,rren,rent}}
whose distinct tetragrams are
In[]:=
Union@@%
Out[]=
{abho,bhor,ence,horr,orre,renc,rent,rren}
Find all (thirty) words containing ORRE:
In[]:=
Select[WordList[],StringMatchQ[#,___~~"orre"~~___]&]
Out[]=
{abhorrence,abhorrent,correct,correctable,corrected,correction,correctional,corrections,corrective,correctly,correctness,correlate,correlated,correlation,correlative,correspond,correspondence,correspondent,corresponding,correspondingly,horrendous,incorrect,incorrectly,incorrectness,sorrel,torrent,torrential,uncorrectable,uncorrected,uncorrelated}
MEOW Spoiler:
In[]:=
Select[WordList[],StringMatchQ[#,___~~"meow"~~___]&,3]
Out[]=
{homeowner,meow}
Find all the tetragrams in the 39176 list:
In[]:=
Union@@fours/@WordList[];
In[]:=
Length@%
Out[]=
26099
Warning: The following approach was very unKnuth—It took over an hour!
tetragramsWords=Select[Function[foo,Select[WordList[],StringMatchQ[#,___~~foo~~___]&,2]]/@%%,Length@#1&];
In[]:=
Length@%
Out[]=
8696
unique tetragrams!
The first nine words are
In[]:=
Flatten[Take[tetragramsWords,9]]
Out[]=
{ma'am,aardvark,self-abasement,abaft,bouillabaisse,cabdriver,abeam,arabesque,abeyance}
Right away, we have a puzzle:
In[]:=
WordData["aard"~~__]
Out[]=
{aardvark,aardwolf}
There are two words with AARD, so why is aardvark in the list?!
(Hint: The same reason aardwolf isn’t!)
The last eleven words are
In[]:=
Flatten[Take[tetragramsWords,-11]]
Out[]=
{buzzword,lazybones,zydeco,enzymatic,enzyme,huzzah,mezzanine,fuzzed,dizzily,razzmatazz,buzzword}
Puzzle number two: Why is buzzword in here twice??
Exercise: Repeat this derivation after purging the hyphenated and apostrophized words.
Hint:
In[]:=
tim[xp_]:=(Print[#[[1]],",",Length[#[[2]]]];#[[2]])&@AbsoluteTiming[xp](*timismyidentityfunctionthatinterjectselapsedtimeandlength.*)
In[]:=
SetAttributes[tim,HoldAll];
In[]:=
Union@@fours/@Select[WordList[],StringFreeQ[#,"-"|"'"]&]//tim;
0.503558,25651
(vs 26099)
Surprisingly, this
In[]:=
Select[Function[foo,Select[WordList[],StringMatchQ[#,___~~foo~~___]&,2]]/@%,Length@#1&]//tim
0.452913,25651
took “only” 25 minutes. And found 322 fewer unique tetragrams. —Bill
PS, do not confuse “tetragram” with the sacred, ineffable tetragrammaton, CNWH.