Abstracting Repository Content
Defining Content Types and query language abstraction lends power over a traditional database
This is the second posting in the series Implementing a Content Database In 10 Steps. Please see previous posts for more background on our working example here.
Introduction and Common Terms
It is useful in this tutorial to first identify some terms that are used by webCOMAND and other technologies in this space. Many of these will be further clarified in this article and in later posts on this topic.
- Content: database data with additional context, such as organization, authorizations, version control, and other common features
- Content Type: a system- or user-provided definition for a conceptual "type" of data. This can be "Customer" or "Product" in an e-commerce solution, or a "Blog Post" or other application-level concept. This is like a table in a traditional relational database, but may not directly correlate to a single database table in practice.
- Object: one particular piece of content. This is like a row in a traditional relational database, but may ultimately be stored in several rows in one or more tables to support the metadata providing context.
- Repository: the data store containing content for an application. This is implemented as a traditional relational or object database with an API that handles adding the context features.
A Content Database instance on a particular server is referred to as a "repository". A repository is roughly analogous to a database in a traditional application, and holds all related content in an application. A repository is an abstraction, however, from a traditional database and uses an Application Programming Interface (API) to manage interactions with its storage layer. webCOMAND's API implements a Content Database on top of a traditional relational database. Today we use MySQL (MariaDB), but our storage layer is abstracted to allow any technology to sit underneath with an adapter.
Implementing Content Types
The first step in implementing a content database, and adding important contextual features like access control, inheritance, organization, and variants. Consider our Product Catalog example:
|1||Tank with logo graphic||15.00|
|2||T-shirt with logo graphic||20.00|
|3||Polo shirt with logo graphic||25.00|
We can consider Products to be a 'Content Type'; that is, a formal definition of fields and behaviors for a specific type of content in an application. The Content Database should allow users to define Content Types much like they would define tables in a traditional database; however, the content database has the power to enable other features over what a traditional database would provide.
Because a Content Database is implemented on top of a traditional database, we can model Content Types as tables in the relational database schema. However, our first abstraction will be to decouple the name of the Content Type from the table itself. We will therefore define Content Types in their own database lookup table and assign them each a unique auto-incrementing ID.
There are a number of benefits to this approach. First, we avoid costly ALTER TABLE statements when we change the Content Type identifier. ALTER TABLE statements will implicitly commit transactions in a number of database implementations, which would prevent us from making a Content Type identifier change in addition to other edits in a single atomic transaction. Secondly, changing a commonly used Content Type identifier would require all logic making queries to it to change right away, making provisioning and rolling out changes complex and costly; by using a lookup table for the Content Type identifier we have more flexibility over how we roll out these changes.1
Another important improvement to a simple Content Type lookup table, however, is to consider Content Types to be content as well, just like any other user-generated content in the repository. This allows us interact with types in the API, create relationships to them with other content, and query for them just like any other content. So, our lookup table might look like the following, saving the first several rows for reserved system content types:
|ContentType_1 (Content Types)|
So in this example the Products table will be renamed to ContentType_100. The API can handle this naming abstraction however, and will have code that enables it to determine the tables to query internally to map content types to their IDs. This means that in CQL we can still query for Titles and Prices of Products with the following query:
However, in SQL, this becomes the slightly more complex
Our Product table example (ContentType_100) above contains an auto-increment PRIMARY KEY column called 'ID', which is common in relational database schemas. While this allows us to uniquely identify a single row in a table, in a Content Database it is useful to assign additional unique identifiers for content as well. In future blog posts we will show how we may need multiple database rows to represent all states of a single content record, and so an ID that is shared across all rows for a content record is useful for queries and application logic.
In webCOMAND an OID, or Object ID, uniquely identifies content within a single repository. This is an integer which makes querying, storing, and working with OIDs as simple as possible. We implement OIDs by adding a content_oid table to our database, auto-incrementing it to ensure that OIDs are unique across the entire repository. We also add an OID column to every content table in the database, and the API layer ensures that it is set whenever content is created.
webCOMAND also introduces universally unique identifiers, or UUIDs, that identify content across webCOMAND repositories. This allows content to be shared and updated between webCOMAND instances, facilitating apps syndicating content across multiple servers, or even a marketplace of webCOMAND content. UUIDs are 16 byte binary identifiers generated using common algorithms based on server identity and timestamps. The webCOMAND API ensures that content is assigned a UUID and uses this UUID to correlate data being imported from other systems.
So, our repository with its Content Types and Products now looks more like this:
|ContentType_1 (Content Types)|
|1||101||276af0cc-316c-11e9-874d-6003088974ec||Tank with logo graphic||15.00|
|2||102||276afc5c-316c-11e9-874d-6003088974ec||T-shirt with logo graphic||20.00|
|3||103||276b092c-316c-11e9-874d-6003088974ec||Polo shirt with logo graphic||25.00|
1 Note that we do not also abstract column names as field unique identifiers in webCOMAND. While Content Type fields are also represented as content in webCOMAND and are assigned a unique ID, we did not abstract the column titles for backward-compatibility purposes with earlier versions of our system. Future implementations will likely complete this abstraction to give us the same flexibility as we gain from abstracting Content Type tables from the internal database schema.