The sense of the Past

This is the second post in a series dedicated to speculating around databases that started with Event Daydreaming some weeks ago. The goal of the series is to suggest and test a different approach to saving data for future consumption in the context of software applications.

You may find the topic itself deadly dull and leave. Let me beg for your indulgence, for I promise exciting rewards are waiting for you at the end of this post.

Even though the list of database engines you can pick to integrate with your software application is long, the experienced software architect must recognize that the variety of solutions she can find in typical implementations is painfully narrow.

Roughly speaking, there is always one core database, either relational or not relational, tightly coupled with the main application, and one or more specific-purpose databases loosely coupled with the application to provide complementary features.

For example, one PostgreSQL or DynamoDB database to save Domain data (Clients, Orders, etc.), and then one Redis instance for caching, one Prometheus instance for monitoring data, and so on. No matter how complicated the application topology might be, the architecture beneath is likely, and boringly, the same.

One might wonder, is it even possible to design more imaginative schemas for the relationship between applications and databases?

In fact, what is the purpose of this relationship in the first place?

Memories recoverability

There is only one goal for keeping a record of past events: memory. Software applications are not different from live beings on this. People and machines spend resources on saving data because we all assume that we’ll make use of (some pieces of) that data in the future.

Some of these memories may be extremely relevant for us, like those which create our identity, whereas others may be just configuration parameters we simply rely on to function.

The collection of all the existing ways of providing memory (databases, file storage, RAM, etc.) to applications is often called their persistence layer. The act of remembering would be therefore the action of going there, to the persistence layer, and pull the desired data out of it.

The operation of picking facts out of that persistence layer is more intriguing than it seems though. Let’s see way.

The readers of this blog are already aware that software applications make sense to their users because applications tell stories. These stories happen in the world where these users live, specifically in that coherent fragment that we call Domain, and which elements (companies, goods, shipping operations, etc.) we represent in the software application as meaningful packages of data and behaviour named Entities.

For example, in a fashion firm, clothes may likely be represented in their e-commerce assistant software as Item Entities. These Entities exist in the persistence layer as data structures with properties like size, colour, price, etc. And the same for Orders, Invoices, etc.

In other words, applications do not save data in their persistence layer the way it exists in the Domain, but how those applications represent that data. Check out this post about the Codomain for further details.

To make things even more convoluted, we must realize that persistence layers do not handle Entities: they handle records in tables, JSON files, etc. So some kind of connecting bridge between what software applications understand (Entities) and what persistence layers handle (records, files) must be built.

In summary, whatever we have in the Domain (remember, our real world) is represented in software applications as Entities, and those Entities are eventually saved for future use as database tables, files, or some computer’s RAM in the persistence layer. Astonishingly, it works!

Cooling, warming, and hydrating

As I said above, we spend money provisioning a persistence layer to enable data recovering. In a way, recovering a recorded Entity from the persistence layer is like rehydrating a soup, a mushroom, or a seed. Save technical details, this operation means taking an instance of an Entity with its properties empty and populating them with the data pulled out of the persistence layer. Indeed, some software intermediaries that bridge software applications and databases call this operation hydration.

The cycle of persisting and recovering entities

Most importantly, this hydration procedure must guarantee that the recovered Entity is identical to how it was the last time it was persisted. In other words, all the properties receive the same values that they had the moment they were sent to the persistence layer.

Just imagine how awkward would be to see your profile’s name changed every time you check your social networks or your mailbox!

All in all, it looks pretty simple. We must ensure that the bridge that carries those property values from and to their place in the persistence layer is safe and that they stay safe there.

In a way, we want the persistence layer tools to act like cryogenic devices: keep data as though it had been frozen until we need it again, and if changed, transport those changes to the persistence layer and freeze it back.

Now, what if the Entity has changed its properties or its behaviour, or both? What will happen when we pull back to life an instance of that Entity that was persisted before those changes took place?

Let’s illustrate this problem with an example. Imagine an entity Post first designed like this:

class Post
{
    UUID4 "id"
    String "title"
    DateTime "publishedAt"
    ...
}

Once persisted, one instance of a Post might look like this:

{
    "id": "5ab345d0-7078-4b7c-88c1-d2c9f440ab6c",
    “title”: “The Sense of Time”,
    “published_at”: “2021-12-05 10:13:25 UTC”
}

Let’s imagine now that a new version of the application handles this different version of Posts:

class Post
{
    UUID4 "id"
    String "title"
    Int "publishedAt"
    Array<String> "tags"
    ...
}

According to this new structure, a new version of the Post instance above may be persisted like this:

{
    "id": "5ab345d0-7078-4b7c-88c1-d2c9f440ab6c",
    “title”: “The Sense of the Past”,
    “published_at”: 1638699205,
    “tags”: [“software architecture”, “databases”]
}

What should happen when we rehydrate data that was persisted (and therefore frozen) with an old version of the Entity, a version that does not exist in the application anymore? And, what should happen when data created with the last version of the Entity is sent to the persistence layer for the very first time?

Both operations would end in a very unfortunate data clash unless we actively implement an adaptation procedure that captures whatever Entity is going to be saved and modifies the Entity’s properties to fit with what the persistence layer expects, and the other way around.

Complicated? Indeed! Let’s see what forms this adaptation procedure may take in practice.

The common approach to Recoverability

One of the unfortunate consequences of having a core database tightly coupled with a software application is that the bridge between them is never easy to change. This happens because every property in the Entity is linked to a column in a table or a property in a file. These links are also explicit, meaning we must keep them updated at all time.

So every change in either the Entity or the table requires changing the table or the Entity accordingly as well as the bridge that maps one with the other. In addition, we will need to update all the existing records in the affected tables too. All this is actualy as risky and time expensive as it seems.

Even though, this strategy is indeed the standard approach to recoverability.

Being so expensive in terms of resources and risk, one wonders if more flexible approaches may be better.

Other approaches to Recoverability

Deferred updating

This tactic simply means that we would write code after (when reading) or before (when writing) the hydration operation. This code would take the outdated representation and transform it into the version that the receiver of that representation (either the application when reading or the persistence layer when writing) can understand.

In the example above with Posts, using this deferred updating tactic would mean that every time an old-version Post is read from the database, the property published_at would be transformed from a DateTime to an Int.

Some call this a lazy migration procedure, for it does not involve running a full update of all the persisted records in one shot. On the contrary, the existing records are going to be updated only as soon as they are rehydrated. This operation may take days, weeks, or even months, during which both the old and the new versions of the table, or the file, will coexist.

The biggest advantage of this approach is that it keeps the logic of the adaptation in its right place, the application itself. Its main disadvantage is that it overcomplicates the exploitation of the data, for some records keep the old version whereas others use the new one.

Persistence proxy

The tight coupling between applications and core databases stated above is made explicit by the fact that the bridge between them both is part of the application code. This means that the application knows which persistence engine is going to use, and implement the bridge among the available ones that matches that persistence engine.

For example, if we have an application written in Python and a Redis database, we will have to implement the Redis bridge available for Python applications.

It is precisely this awareness that creates the coupling. So, what if we made that awareness disappear?

The simplest way to achieve this would be to keep all the knowledge about the database engine inside the bridge that connects it with the application. Under this architecture schema, the application would speak with the bridge and the bridge only. The bridge would know that the engine in use is Redis, but the application itself may ignore that fact absolutely.

Under this schema, a Python application would only need to implement its connection to the bridge, whereas the bridge would be the place where the connector with Redis would exist.

As a way of recognising this promoted bridge, we may call it Persistence Proxy instead.

In the overall constellation of components that constitute every software application, the persistence layer would be like a far sector. Every component would command the Proxy to read data, write data, or both. But only the Proxy will know where, and how, that data is persisted.

However, following the standard way of designing software, this Proxy would still be an internal part of every software component, Would it be possible to go even further?

A more radical approach: persistence brokers

Once we broke free of the coupling between applications and their persistence layer, a new world of possibilities opens before our eyes.

Notice that this liberation is not only structural but also mental: it should reinforce the idea that keeping memories is more a convenient operation than anything else. Persistence is instrumental, in the sense that it supports the true goal of every application: to provide useful features to users.

This means that software applications should be designed so that they can change their persistence providers at convenience. Or have as many persistence providers as they need.

Indeed, what this is means is that software applications should not care about persistence whatsoever.

So, let’s define an external component to manage the whole persistence function, and let’s call this new component a Persistence Broker.

Instead of mapping the Entities to data structures in the persistence layer, applications would just deliver Data Commands to this Persistence Broker and this would handle them as it found proper. Unlike the Persistence Proxy we saw above, the Persistence Broker would be external and independent of the other components within the application.

The Persistence Broker would know the full catalogue of available persistence engines. It would take those Data Commands from the application and would divert them to the right persistence engine according to predefined Policies; or, more conveniently, with Machine Learning-based tactics.

The Persistence Broker would not only handle data reading and writing operations on demand of application components. It would also take an active role in deciding where to put what. For example, Entities with short lifespans may be saved into in-memory databases (i.e., Redis) whereas entities with long lifespans may be saved in relational, or not relational, databases.

More sophisticated strategies may be possible too. If we thought of Entities with long lifespans, we would soon realize some entities in this category are more stable than others. For example, Customers are usually created once and tend to stay stable for a long time until someday they change again. Orders or Invoices, on the other hand, suffer multiple changes at the beginning of their lives, and after that, they stay solidly unchanged forever.

A Persistence Broker may be trained to keep those Entities in memory until it realizes their period of changes is finished, and then it would move them to a disk-based database of some kind.

Wrap up

As I said in the first post in this series, Daydreaming can be unexpectedly rewarding. Dare you to wonder what great surprises the next chapter may bring?

What if we see a Persistence Proxy, and an application using it, in action?

Copyright notice: the picture is a photogram of the film Memento (2000) used without permission.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: