Abstracting Repository Content

Defining Content Types and query language abstraction lends power over a traditional database

By Cathy Burley December 15, 2018

This is the second posting in the series Implementing a Content Database In 10 Steps. Please see previous posts for more background on our working example here.

Introduction and Common Terms

It is useful in this tutorial to first identify some terms that are used by webCOMAND and other technologies in this space. Many of these will be further clarified in this article and in later posts on this topic.

Content: database data with additional context, such as organization, authorizations, version control, and other common features
Content Type: a system- or user-provided definition for a conceptual "type" of data. This can be "Customer" or "Product" in an e-commerce solution, or a "Blog Post" or other application-level concept. This is like a table in a traditional relational database, but may not directly correlate to a single database table in practice.
Object: one particular piece of content. This is like a row in a traditional relational database, but may ultimately be stored in several rows in one or more tables to support the metadata providing context.
Repository: the data store containing content for an application. This is implemented as a traditional relational or object database with an API that handles adding the context features.

A Content Database instance on a particular server is referred to as a "repository". A repository is roughly analogous to a database in a traditional application, and holds all related content in an application. A repository is an abstraction, however, from a traditional database and uses an Application Programming Interface (API) to manage interactions with its storage layer. webCOMAND's API implements a Content Database on top of a traditional relational database. Today we use MySQL (MariaDB), but our storage layer is abstracted to allow any technology to sit underneath with an adapter.

Content databases implement many content-oriented contextual features like access control, inheritance, organization, and variants. In doing so it is essential that interactions with the database are abstracted through the API, because the underlying database queries would otherwise be come unmanageably complex. One common approach is to introduce API functions in native languages like PHP or JavaScript, or through a Web Service interface for remote access. For example GraphQL, the query language used by GraphCMS, uses structured JSON to define and query the repository content. webCOMAND approaches this by introducing an SQL-like query language called cQL (Context Query Language). cQL implements many familiar SQL clauses and functions while adding new language elements that expose its Content Database features. For the rest of this blog series we will provide side-by-side comparisons of cQL and the underlying SQL as we highlight how the Content Database is implemented.

Implementing Content Types

The first step in implementing a content database, and adding important contextual features like access control, inheritance, organization, and variants. Consider our Product Catalog example:

Products
ID	Title	Price	...
1	Tank with logo graphic	15.00
2	T-shirt with logo graphic	20.00
3	Polo shirt with logo graphic	25.00

We can consider Products to be a 'Content Type'; that is, a formal definition of fields and behaviors for a specific type of content in an application. The Content Database should allow users to define Content Types much like they would define tables in a traditional database; however, the content database has the power to enable other features over what a traditional database would provide.

Because a Content Database is implemented on top of a traditional database, we can model Content Types as tables in the relational database schema. However, our first abstraction will be to decouple the name of the Content Type from the table itself. We will therefore define Content Types in their own database lookup table and assign them each a unique auto-incrementing ID.

There are a number of benefits to this approach. First, we avoid costly ALTER TABLE statements when we change the Content Type identifier. ALTER TABLE statements will implicitly commit transactions in a number of database implementations, which would prevent us from making a Content Type identifier change in addition to other edits in a single atomic transaction. Secondly, changing a commonly used Content Type identifier would require all logic making queries to it to change right away, making provisioning and rolling out changes complex and costly; by using a lookup table for the Content Type identifier we have more flexibility over how we roll out these changes.¹

Another important improvement to a simple Content Type lookup table, however, is to consider Content Types to be content as well, just like any other user-generated content in the repository. This allows us interact with types in the API, create relationships to them with other content, and query for them just like any other content. So, our lookup table might look like the following, saving the first several rows for reserved system content types:

ContentType_1 (Content Types)
ID	Title
1	Content Type
...
100	Product

So in this example the Products table will be renamed to ContentType_100. The API can handle this naming abstraction however, and will have code that enables it to determine the tables to query internally to map content types to their IDs. This means that in CQL we can still query for Titles and Prices of Products with the following query:

SELECT Title, Price FROM Product WHERE Price<100

However, in SQL, this becomes the slightly more complex

SELECT Title, Price FROM ContentType_100 WHERE Price<100

Adding Identifiers

Our Product table example (ContentType_100) above contains an auto-increment PRIMARY KEY column called 'ID', which is common in relational database schemas. While this allows us to uniquely identify a single row in a table, in a Content Database it is useful to assign additional unique identifiers for content as well. In future blog posts we will show how we may need multiple database rows to represent all states of a single content record, and so an ID that is shared across all rows for a content record is useful for queries and application logic.

In webCOMAND an OID, or Object ID, uniquely identifies content within a single repository. This is an integer which makes querying, storing, and working with OIDs as simple as possible. We implement OIDs by adding a content_oid table to our database, auto-incrementing it to ensure that OIDs are unique across the entire repository. We also add an OID column to every content table in the database, and the API layer ensures that it is set whenever content is created.

webCOMAND also introduces universally unique identifiers, or UUIDs, that identify content across webCOMAND repositories. This allows content to be shared and updated between webCOMAND instances, facilitating apps syndicating content across multiple servers, or even a marketplace of webCOMAND content. UUIDs are 16 byte binary identifiers generated using common algorithms based on server identity and timestamps. The webCOMAND API ensures that content is assigned a UUID and uses this UUID to correlate data being imported from other systems.

So, our repository with its Content Types and Products now looks more like this:

ContentType_1 (Content Types)
ID	OID	UUID	Title
1	1	276acbe2-316c-11e9-874d-6003088974ec	Content Type
...
100	100	276ae0d2-316c-11e9-874d-6003088974ec	Product

ContentType_100 (Products)
ID	OID	UUID	Title	Price	...
1	101	276af0cc-316c-11e9-874d-6003088974ec	Tank with logo graphic	15.00
2	102	276afc5c-316c-11e9-874d-6003088974ec	T-shirt with logo graphic	20.00
3	103	276b092c-316c-11e9-874d-6003088974ec	Polo shirt with logo graphic	25.00

¹ Note that we do not also abstract column names as field unique identifiers in webCOMAND. While Content Type fields are also represented as content in webCOMAND and are assigned a unique ID, we did not abstract the column titles for backward-compatibility purposes with earlier versions of our system. Future implementations will likely complete this abstraction to give us the same flexibility as we gain from abstracting Content Type tables from the internal database schema.

Cover Photo by Dmitry Ratushny on Unsplash