In[]:=
Clear["Global`*"];
(* a simple function to detect which environment we're in -- the local machine or the cloud *)
detectCloud[] := StringContainsQ[NotebookDirectory[], "wolframcloud"];
Data Collection: Decennial Population and Apportionment
Data Collection: Decennial Population and Apportionment
Sources
Sources
◼
◼
We’ll start with the state abbreviations from the Post Office
We’ll start with the state abbreviations from the Post Office
In[]:=
table = Import["http://about.usps.com/who-we-are/postal-history/state-abbreviations.htm", {"HTML", "Data",2}][[2]];
Print[Grid[table[[2;;6]], FrameAll]];
1831 | 1874 | 1943 | 1963 | |
Alabama | Al. | Ala. | Ala. | AL |
Alaska | -- | Alaska | Alaska | AK |
Arizona | -- | Ariz. | Ariz. | AZ |
Arkansas | Ar. T. | Ark. | Ark. | AR |
Let's toss in entities to the output, since we can be sure they’re uniform across different sources, spellings and punctuations
In[]:=
data = { StringTrim[First@#], StringTrim[Last@#] }& /@ Rest@table;
data = Select[data, First@# ≠ "" && First@# ≠ "Puerto Rico"&];
states = <| "name" First@#, "abbr" Last@#, "entity" Interpreter["USState"][First@#] |>& /@ data;
And make an association of the associations
In[]:=
statesAssociation = AssociationThread[#["name"]& /@ states, states];
We’ll need to correct Nebraska:

In[]:=
statesAssociation["Nebraska"]["abbr"] = "NE";
Now let’s get decennial Census data.
Now let’s get decennial Census data.
While Mathematica has population conveniently built in to state entities, let’s be extra careful and use the same figures the U.S. used for apportionment.
We can get the table data from the URL at the top of the document directly as a CSV to save some time.
In[]:=
csv = Import["https://www2.census.gov/programs-surveys/decennial/tables/2010/2010-apportionment/pop_change.csv", "CSV"];
data = Most@Take[csv, -52];
header = csv[[3]]
Out[]=
{STATE_OR_REGION,1910_POPULATION,1920_POPULATION,1930_POPULATION,1940_POPULATION,1950_POPULATION,
1960_POPULATION,1970_POPULATION,1980_POPULATION,1990_POPULATION,2000_POPULATION,2010_POPULATION,
1910_CHANGE,1920_CHANGE,1930_CHANGE,1940_CHANGE,1950_CHANGE,1960_CHANGE,1970_CHANGE,1980_CHANGE,
1990_CHANGE,2000_CHANGE,2010_CHANGE}
Let’s clean up the header, reducing to just to the populations -- we can calculate the change ourselves if we ever need it.
In[]:=
shortHeader = { "name", "1910", "1920", "1930", "1940", "1950", "1960", "1970", "1980", "1990", "2000", "2010" };
And map the header to the data objects
In[]:=
population = Map[AssociationThread[shortHeader, #[[1;;12]]]&, data];
Grid[Prepend[Values /@ population[[1;;4]], shortHeader], FrameAll, Spacings{0.6,1}]
Out[]=
name | 1910 | 1920 | 1930 | 1940 | 1950 | 1960 | 1970 | 1980 | 1990 | 2000 | 2010 |
Alabama | 2138093 | 2348174 | 2646248 | 2832961 | 3061743 | 3266740 | 3444165 | 3893888 | 4040587 | 4447100 | 4779736 |
Alaska | 64356 | 55036 | 59278 | 72524 | 128643 | 226167 | 300382 | 401851 | 550043 | 626932 | 710231 |
Arizona | 204354 | 334162 | 435573 | 499261 | 749587 | 1302161 | 1770900 | 2718215 | 3665228 | 5130632 | 6392017 |
Arkansas | 1574449 | 1752204 | 1854482 | 1949387 | 1909511 | 1786272 | 1923295 | 2286435 | 2350725 | 2673400 | 2915918 |
Looks good. Let’s connect to the object from the Post Office
In[]:=
statesAssociation = AssociationThread[
Keys@statesAssociation,
Append[statesAssociation[#["name"]], "population_decennial" Rest@#]& /@ population
];
In[]:=
Print[statesAssociation["Alaska"]];
Print[statesAssociation["Arkansas"]];
nameAlaska,abbrAK,entity,population_decennial191064356,192055036,193059
278,194072524,1950128643,1960226167,1970300382,1980401851,1990550043,2000626932,2010710
231
nameArkansas,abbrAR,entity,population_decennial19101574449,19201752204,
19301854482,19401949387,19501909511,19601786272,19701923295,19802286435,19902350725,
20002673400,20102915918
And now the apportionment data
And now the apportionment data