Implementing Inheritance and Relationships

Implementing Inheritance and Relationships

Leveraging the power of abstracted content types to introduce object-oriented concepts

This is the third posting in the series Implementing a Content Database In 10 Steps. Please see previous posts for more background on our working example here.

Inheritance and Extending Content Types

One powerful feature of Content Databases is the ability to exploit Object-Oriented features like Inheritance. In our previous post we introduced the concept of a Content Type, which defines the features and behaviours of a certain type of content within an application. Content Types are therefore analogous to the concept of a "class" in an object-oriented programming language.

webCOMAND allows Content Types to extend others, which enables its field and behavior definitions to be shared automatically in the inheritance hierarchy without needing to redefine them. The real power of inheritance however is in the ability to query against several content types within an inheritance hierarchy in a single query, allowing application logic to naturally group related content.

Consider the Product example from earlier posts in this series:

ContentType_100 (Products)
ID Title Price ...
1 Tank with logo graphic 15.00  
2 T-shirt with logo graphic 20.00  
3 Polo shirt with logo graphic 25.00  

*Note that we have omitted extra metadata columns introduced in our last post for simplicity here.

Suppose our application needs a new type of product in its product catalog. Products in this simple example have a Title and Price, but clothing like this would likely also specify the material it is made of. Other types of products, like appliances, would have no materials but may have model numbers, whereas including books in this catalog will have authors.

We can imagine that all products in the catalog will have a Title and Price, at least for our simple example. We will therefore break out Clothing, Appliances, and Books into three subtypes of Product. These three new Content Types will share the same Title and Price fields, but will each define their own specialized fields pertaining to the type of product they represent.

ContentType_100 (Products)
ID Title Price ...
ContentType_101 (Clothing)
ID Title Price Material ...
1 Tank with logo graphic 15.00 cotton  
2 T-shirt with logo graphic 20.00 cotton  
3 Polo shirt with logo graphic 25.00 polyester  
ContentType_102 (Appliances)
ID Title Price Model ...
1 Microwave Oven with Fan 200.00 MWO1234L  
2 Stainless Refrigerator with Drawer Freezer 900.00 FDBF9876  
3 Gas Range with Central Burner 600.00 OR8273-G  
ContentType_103 (Books)
ID Title Price Author ...
1 The Call of the Wild 20.00 Jack London  
2 Anne of Green Gables 12.00 L. M. Montgomery  
3 Life and Times of Frederick Douglass 14.00 Frederick Douglass  

While these are stored as separate database tables in the underlying relational database, the Content Database API should manage creating these types and keeping them up to date. For example, if another common field, Description, is added to the Product Content Type, the API should also add this same column definition to every type that extends it. Similarly, the API should facilitate creating new types of Products and automatically reuse shared field definitions where applicable.

Querying Inherited Types

webCOMAND introduces a language feature to its context query language (cQL) that enables querying either for a specific content type, or for content types and those that inherit from it. For example, the following cQL from previous posts in this series queries only for generic Product records:

SELECT Title, Price FROM Product WHERE Price<100

However, adding a + to the Content Type in the FROM clause specifies that we should include all types of products in the query results:

SELECT Title, Price FROM Product+ WHERE Price<100

Translating this query from cQL to SQL on the underlying database becomes a bit more complex. One approach would be to UNION results across the multiple database tables. While this would work in this simple case, this would begin requiring subqueries and create other performance issues if we also need JOIN clauses from other tables. This would present scalability challenges if there were for example 100 types of products in the system, or more. We can mitigate these issues by adding Object and Field tables to our database schema.

Adding an Object Table

We can begin to address this by introducing an object table, which tracks all content in the repository. The API will manage this table whenever content is created or deleted, and provides us a single table to begin our query across multiple types at once. This Object table will hold a row for every content row in the repository, including all of its meta data. Here we include the auto-increment primary key ID column of the object table (for uniquely referencing an object row), the object's OID (Object ID, introduced in this post), the CT_ID (unique ID of the content type), and C_ID (the unique auto-increment key ID in the content type table).

ContentType_2 (Objects)
1 1 101 1
2 2 101 2
3 3 101 3
4 4 102 1
9 9 103 3

As with the abstracted Content Type tables above, the API is responsible for making sure that every object in the repository is added to this Objects table as well. This ensures consistent data whether querying against just one type or against multiple types.

Adding Field Tables

The Title and Price field, which were introduced in the Product content type, are shared among all of its subtypes. Querying across all types of products would require a query from the generic Object table, and then joining back to the original tables to get their titles and prices. To avoid excessive joining in this common scenario, we also introduce field tables representing the unique field identifiers in the system. The 'Title' and 'Price' field tables holds rows for each object in the repository with Titles and Prices, in this example, and allows for cross-type comparisons of common fields without joining to every table that has a these fields defined.

1 101 1 Tank with logo graphic
2 101 2 T-shirt with logo graphic
3 101 3 Polo shirt with logo graphic
4 102 1 Microwave Oven with Fan
9 103 3 Life and Times of Frederick Douglass
1 101 1 15.00
2 101 2 20.00
3 101 3 25.00
4 102 1 200.00
9 103 3 14.00

The API manages these field tables as Content Types are created and extended, and ensures that records are created in each when content is added to the repository.1

Querying Inherited Types with Objects and Fields

Our cQL query from above, expressed simply for all Products in the repository under $100, is as follows:

SELECT Title, Price FROM Product+ WHERE Price<100

Now, because we have data replicated to our object and field tables, the SQL looks as follows:

SELECT Title.Value, Price.Value
FROM ContentType_2 AS Object
LEFT JOIN ContentField_Title AS Title 
    ON (Object.CT_ID=Title.CT_ID AND Object.C_ID=Title.C_ID)
LEFT JOIN ContentField_Price AS Price 
    ON (Object.CT_ID=Price.CT_ID AND Object.C_ID=Price.C_ID)
WHERE Object.CT_ID IN (101,102,103) AND Price.Value<100

Representing Relationships and References

The above example is simple and does not contain any relationships that are references. For example, books may belong in one or more categories, or clothing may be tagged as "outdoor" and "casual". Categories can also be content in the system, and objects in a content database should be able to store relationships and query against them.

In webCOMAND we represent relationships in a JOIN table that allows us to express arbitrary many-to-many relationships. Suppose for example that we create a Categories content type and identify two for books, as follows:

ContentType_104 (Categories)
ID Label
1 Fiction
2 Non-Fiction

We would then create entries in our single JOIN table, which in webCOMAND is called ContentReference in the database.

ID ParentField ParentType ParentID ChildType ChildID
1 1 103 1 104 1
2 1 103 1 104 1
3 1 103 1 104 2

This table relates "Parent" objects (in this case, Books) with "Child" objects (in this case, Categories). The ParentType/ChildType columns refer to the unique Content Type IDs, and the ParentID/ChildID columns refer to the IDs from those Content Type tables.2 3

Querying for Content Relationships

This abstracted JOIN table approach allows us to generate queries using the simpler cQL language that can automatically be turned into more complex SQL. We use 'dot' notation to traverse relationships. In this example, we are querying for all Products where one of their assigned category(ies) is called 'Fiction'.

SELECT Title, Price 
FROM Product+ 
WHERE Price<100 AND Category.Label='Fiction'

This becomes the more complex SQL which returns 2 books, based on our reference table rows above.4

SELECT Title.Value, Price.Value
FROM ContentType_2 AS Object
LEFT JOIN ContentField_Title AS Title 
    ON (Object.CT_ID=Title.CT_ID AND Object.C_ID=Title.C_ID)
LEFT JOIN ContentField_Price AS Price 
    ON (Object.CT_ID=Price.CT_ID AND Object.C_ID=Price.C_ID)
LEFT JOIN ContentReference AS CategoryRef 
    ON (Object.CT_ID=CategoryRef.ParentType AND Object.C_ID=CategoryRef.ParentID)
LEFT JOIN ContentType_104 AS Category
    ON (CategoryRef.ChildType=104 AND CategoryRef.ChildID=Category.ID)
WHERE Object.CT_ID IN (101,102,103) AND Price.Value<100 AND Category.Label='Fiction'

1 webCOMAND stores field data both in the primary Content Type table as well as in the introduced field tables. This is largely for backward compatibility with previous versions of webCOMAND, and future versions may fully abstract field data into these field tables and away from the Content Type tables to reduce storage requirements and load on the database.

2 The ContentReference table also defines a ParentField column which identifies the field ID from the Product's Content Type definition. Content Type Fields, like Content Types, are also represented in the repository as content and are assigned unique IDs. See this post for more information on how Content Types are defined and the benefits of representing core types in this way. For simplicity we are glossing over this detail here.

3 Also, for backward compatibility, Content References are identified by ID, whereas simple data fields like Title are identified by their string Identifier. These are also grouped into a single JOIN table, as opposed to breaking out by field ID or Identifier, for backward compatibility with earlier versions of webCOMAND. There are benefits to the string approach, where multiple different fields with the same Identifier can be lumped and queried together (for example, INTs, BIGINTs, and FLOATs). This does not however have the benefits of abstraction that numeric IDs give us. Future versions of webCOMAND will address these inconsistencies for simplicity and runtime performance.

4 Because in our example Categories have no inherited Content Types, we can use its content type table directly for simplicity and performance. If we introduced sub-types for Categories, the API engine would automatically substitute JOINs to the abstracted Object and Field tables accordingly.

Photo by Clint Adair on Unsplash