The objective of this talk is getting you started with the main concepts of working with RelationalDatabases through the Entity framework in the Wolfram Language.

Outline

◼
  • Philosophy
  • ◼
  • Classes of entities
  • ◼
  • Inspection
  • ◼
  • Querying
  • ◼
  • Relations
  • ◼
  • EmbeddedSQL and ExternalEvaluate
  • ◼
  • Q&A
  • Philosophy

    EntityValue[class,properties]

    First argument: EntityClass (and related heads)

    ◼
  • Fully inert and symbolic
  • ◼
  • Represents all the data
  • ◼
  • Rows
  • Second argument: EntityProperty (usually plural)

    ◼
  • Projection
  • ◼
  • Columns
  • Principles

    ◼
  • Resolution only happens when EntityValue is called
  • ◼
  • EntityList considered dangerous (/s)
  • ◼
  • 1 EntityValue = 1 SQL query
  • Classes of entities

    EntityClass

    Legacy, can do basic filtering and sorting

    FilteredEntityClass

    Full-fledged filtering (WHERE)

    ExtendedEntityClass

    Annotations, renamings

    SampledEntityClass

    Roughly follows the Take syntax (LIMIT, OFFSET)

    SortedEntityClass

    Like SortBy (ORDER BY)

    AggregatedEntityClass

    2-ary version performs aggregations, 3-argument version groups and aggregates (GROUP BY)

    CombinedEntityClass

    Similar to JoinAcross, but supports more complex statements (JOIN)

    UnionedEntityClass

    Most complex set operation, supports SameTestProperties (UNION, UNION ALL, DISTINCT ON)

    IntersectedEntityClass

    Also supports SameTestProperties (INTERSECT)

    ComplementedEntityClass

    Also supports SameTestProperties (EXCEPT)

    Classes of entities (cont.d)

    ◼
  • Like Lego: each class of entities is takes one (or more) classes of entities as arguments
  • ◼
  • Completely inert and symbolic
  • ◼
  • You can reason about their EntityProperties without executing
  • Inspection

    In[]:=
    schema=RelationalDatabase[FindFile["ExampleData/ecommerce-database.sqlite"]]
    Out[]=
    RelationalDatabase
    Table count: 8
    Backend: SQLite
    Tables:
    productlines
    payments
    offices
    products
    orderdetails
    employees
    orders
    customers
    

    Top-level properties

    In[]:=
    schema["Properties"]
    Out[]=
    {Tables,Connection}
    In[]:=
    schema["Tables"]
    Out[]=
    {productlines,payments,offices,products,orderdetails,employees,orders,customers}

    Table-level properties

    In[]:=
    schema["orders","Properties"]
    Out[]=
    {Columns,ForeignKeys,Name,PrimaryKey,UniquenessConstraints,Indexes}
    In[]:=
    schema["orders","Columns"]
    Out[]=
    {orderNumber,orderDate,requiredDate,shippedDate,status,comments,customerNumber}

    Column-level properties

    In[]:=
    schema["orders","orderNumber","Properties"]
    Out[]=
    {NativeTypeString,Default,Indexed,Name,Nullable}
    In[]:=
    schema["orders","orderNumber","NativeTypeString"]
    Out[]=
    INTEGER

    Inspection: EntityStore

    In[]:=
    store=EntityStore[schema]
    Out[]=
    ◼
  • Bold properties are primary keys ⇒ CanonicalName
  • ◼
  • Underlined properties are relations ⇒ Entity- or EntityClass-valued properties
  • ◼
  • → means one-to-one (outgoing foreign key)
  • ◼
  • ← means one-to-many (incoming foreign key)
  • In[]:=
    EntityRegister[store]
    Out[]=
    {productlines,payments,offices,products,orderdetails,employees,orders,customers}

    Querying

    In[]:=
    EntityValue["offices","officeCode"]
    Out[]=
    {1,2,3,4,5,6,7}
    In[]:=
    EntityValue["offices",{"city","country"},"PropertyAssociation"]
    Out[]=
    {citySan Francisco,countryUSA,cityBoston,countryUSA,cityNYC,countryUSA,cityParis,countryFrance,cityTokyo,countryJapan,citySydney,countryAustralia,cityLondon,countryUK}
    In[]:=
    EntityValue[FilteredEntityClass["offices",EntityFunction[x,x["country"]"USA"]],"city"]
    Out[]=
    {San Francisco,Boston,NYC}
    In[]:=
    EntityValue["offices",{"state",EntityFunction[o,MissingQ[o["state"]]]}]
    Out[]=
    {{CA,False},{MA,False},{NY,False},{Missing[NotAvailable],True},{Chiyoda-Ku,False},{Missing[NotAvailable],True},{Missing[NotAvailable],True}}

    Querying (composition)

    Count how many customers per city and country we have
    In[]:=
    aggr=AggregatedEntityClass[​​"customers","customerCount"EntityFunction[c,Length[c["customerNumber"]]],​​{"city","country"}​​];
    In[]:=
    EntityValue[aggr,"PropertyAssociation"]
    Well, this is unsorted, let’s sort it:
    In[]:=
    sort=SortedEntityClass[​​aggr,​​"customerCount""Descending"​​];
    In[]:=
    EntityValue[​​sort,​​"PropertyAssociation"​​]
    Out[]//Shallow=
    
    customerCount
    5,EntityProperty[customers,city]Madrid,EntityProperty[customers,country]Spain,
    customerCount
    5,EntityProperty[customers,city]NYC,EntityProperty[customers,country]USA,
    customerCount
    3,EntityProperty[customers,city]Brickhaven,EntityProperty[customers,country]USA,
    customerCount
    3,EntityProperty[customers,city]Paris,EntityProperty[customers,country]France,
    customerCount
    3,EntityProperty[customers,city]Singapore,EntityProperty[customers,country]Singapore,
    customerCount
    2,EntityProperty[customers,city]Auckland ,EntityProperty[customers,country]New Zealand,
    customerCount
    2,EntityProperty[customers,city]Boston,EntityProperty[customers,country]USA,
    customerCount
    2,EntityProperty[customers,city]Cambridge,EntityProperty[customers,country]USA,
    customerCount
    2,EntityProperty[customers,city]Frankfurt,EntityProperty[customers,country]Germany,
    customerCount
    2,EntityProperty[customers,city]Glendale,EntityProperty[customers,country]USA,86

    Querying (what happens under the hood?)

    In[]:=
    Databases`Database`$DBQueryLogger=Echo;
    In[]:=
    EntityValue["offices",{"state",EntityFunction[o,MissingQ[o["state"]]]}]
    »
    SELECT "offices_T20".state AS state, ("offices_T20".state IS NULL) AS synthetic_prop_1 FROM offices AS "offices_T20"
    Out[]=
    {{CA,False},{MA,False},{NY,False},{Missing[NotAvailable],True},{Chiyoda-Ku,False},{Missing[NotAvailable],True},{Missing[NotAvailable],True}}
    Who is making orders on Sundays?
    In[]:=
    EntityValue[FilteredEntityClass["orders",EntityFunction[o,DateValue[FromUnixTime[o["orderDate"]],"DayName"]==="Sunday"]],EntityFunction[o,o["customers"]["customerName"]]]
    »
    SELECT (SELECT "customers_T67"."customerName" AS "customerName" FROM customers AS "customers_T67" WHERE "customers_T67"."customerNumber" = "orders_T64"."customerNumber") AS synthetic_prop_7 FROM orders AS "orders_T64" WHERE CASE WHEN (0.0 + "orders_T64"."orderDate" IS NULL) THEN NULL WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 1) THEN 'Monday' WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 2) THEN 'Tuesday' WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 3) THEN 'Wednesday' WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 4) THEN 'Thursday' WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 5) THEN 'Friday' WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 6) THEN 'Saturday' WHEN ((CAST(strftime('%w', 0.0 + "orders_T64"."orderDate", 'unixepoch') AS INTEGER) + 6) % 7 + 1 = 7) THEN 'Sunday' ELSE NULL END = 'Sunday'
    Out[]=
    {Mini Creations Ltd.,Oulu Toy Supplies, Inc.,Euro+ Shopping Channel,Vida Sport, Ltd,Norway Gifts By Mail, Co.,Classic Legends Inc.,Mini Gifts Distributors Ltd.,Tekni Collectables Inc.,Québec Home Shopping Network,Souveniers And Things Co.,Mini Gifts Distributors Ltd.}
    In[]:=
    Databases`Database`$DBQueryLogger=.

    Relations

    What’s the overall structure of the database?
    In[]:=
    Graph[Join@@(Thread[#schema[#,"ForeignKeys","ToTable"]]&/@schema["Tables"]),VertexLabels"Name"]
    Out[]=
    In[]:=
    EntityStores[][[1]]
    Out[]=
    Here is a naked column:
    In[]:=
    EntityValue["employees","officeCode"]
    Out[]=
    {1,1,1,6,4,1,1,1,2,2,3,3,4,4,4,7,7,6,6,6,5,5,4}
    Here is the same column expressed as an Entity:
    In[]:=
    EntityValue["employees","offices"]
    Out[]=
    
    1
    ,
    1
    ,
    1
    ,
    6
    ,
    4
    ,
    1
    ,
    1
    ,
    1
    ,
    2
    ,
    2
    ,
    3
    ,
    3
    ,
    4
    ,
    4
    ,
    4
    ,
    7
    ,
    7
    ,
    6
    ,
    6
    ,
    6
    ,
    5
    ,
    5
    ,
    4
    
    And the reverse relation:
    In[]:=
    EntityValue["offices","employees"]
    Out[]=
    {EntityClass[employees,{officeCode1}],EntityClass[employees,{officeCode2}],EntityClass[employees,{officeCode3}],EntityClass[employees,{officeCode4}],EntityClass[employees,{officeCode5}],EntityClass[employees,{officeCode6}],EntityClass[employees,{officeCode7}]}
    A more complicated example:
    In[]:=
    EntityFunction[c,c["employees"]["firstName"]<>" "<>c["employees"]["lastName"]<>" is the sales representative for "<>c["customerName"]<>". Their manager is "<>c["employees"]["employees-reportsTo"]["firstName"]<>" "<>c["employees"]["employees-reportsTo"]["lastName"]][Entity["customers",103]]
    Out[]=
    Gerard Hernandez is the sales representative for Atelier graphique. Their manager is Gerard Bondur
    Databases`Database`$DBQueryLogger=Echo;
    The same thing without relations
    CombinedEntityClass[CombinedEntityClass["customers","employees","salesRepEmployeeNumber"->"employeeNumber"],"manager"->"employees",{"reportsTo"->"employeeNumber"}][EntityFunction[c,c[EntityProperty["employees","firstName"]]<>" "<>c[EntityProperty["employees","lastName"]]<>" is the sales representative for "<>c["customerName"]<>". Their manager is "<>c[EntityProperty["manager"->"employees","firstName"]]<>" "<>c[EntityProperty["manager"->"employees","lastName"]]]]//First
    Aggregation-less aggregations
    In[]:=
    EntityFunction[o,Total[o["orderdetails"]["priceEach"]*o["orderdetails"]["quantityOrdered"]]][Entity["orders",10100]]
    Out[]=
    10223.8

    EmbeddedSQL and ExternalEvaluate

    Useful for things we don’t or can’t support
    In[]:=
    EntityValue[Entity["offices","1"],EntityFunction[x,EmbeddedSQLExpression["date('now','start of year','+6 months','weekday 0')"]]]
    Out[]=
    2023-07-02
    Supports binding things from the outer query
    In[]:=
    EntityValue["offices",{EntityFunction[x,EmbeddedSQLExpression["glob('U*',``)",{x["country"]}]],"country"}]
    Out[]=
    {{1,USA},{1,USA},{1,USA},{0,France},{0,Japan},{0,Australia},{1,UK}}
    What if I want to have query-level EmbeddedSQL?
    In[]:=
    EmbeddedSQLEntityClass["WITH RECURSIVE fact_i (n, fct) AS ( VALUES (0, 1) UNION ALL SELECT n+1, fct * (n+1) FROM fact_i WHERE n < 10)SELECT * FROM fact_i",{"n","fct"}]//EntityProperties
    Out[]=
    {EntityProperty[EmbeddedSQLEntityClass[WITH RECURSIVE fact_i (n, fct) AS ( VALUES (0, 1) UNION ALL SELECT n+1, fct * (n+1) FROM fact_i WHERE n < 10)SELECT * FROM fact_i,{n,fct}],n],EntityProperty[EmbeddedSQLEntityClass[WITH RECURSIVE fact_i (n, fct) AS ( VALUES (0, 1) UNION ALL SELECT n+1, fct * (n+1) FROM fact_i WHERE n < 10)SELECT * FROM fact_i,{n,fct}],fct]}
    Really arbitrary stuff:
    In[]:=
    ExternalEvaluate[DatabaseReference["sqlite://"],{"CREATE TABLE developers (name TEXT NON NULL, awesome INTEGER)","INSERT INTO developers (name, awesome) VALUES (\"carlo barbieri\", 1), (\"donald knuth\", 0)","SELECT name FROM developers where awesome"}]
    Out[]=
    Null,Null,
    name
    carlo barbieri
    

    CITE THIS NOTEBOOK

    Wolfram R&D LIVE: Relational Databases in the Wolfram Language​
    by Carlo Barbieri​
    Wolfram Community, STAFF PICKS, April 25 2023
    ​https://community.wolfram.com/groups/-/m/t/2907390