The problem, the options and our approach¶

The problem¶

We are migrating away from Microsoft Dynamics 2011 (CDMS) and decided to build a new CRM system (Data Hub) using a gradual incremental approach.

During a period of several months, the following constraints apply:

data between CDMS and Data Hub needs to be kept in sync
Data Hub needs allow re-modeling by adding/removing types/properties
some users would continue to use CDMS whilst we transition from one system to the other

The options¶

We considered different approaches including:

use CDMS as data store and access it directly. This has many disadvantages including hosting CDMS, not being able to easily change the schemas, architecture complexity etc.
use two data stores with some sort of low level synchronization (via database or processes). This as well has many disadvantages including integrating with old technologies (Dynamics 2011), two separate layers (code and sync logic) depending on each other tightly and hard to manage, synchronisation conflicts etc.
use two data stores with code-managed synchronization. This is the chosen architecture and has some disadvantages as well that we will explain later.

The chosen approach¶

Two data stores with reads and writes to CDMS happening as usual and synchronisation triggered from io actions in Data Hub.

Writes to Data Hub will:

get the object from CDMS (if it exists)
apply the changes and write to CDMS
apply the changes in Data Hub

Reads from Data Hub will:

get the object from the Data Hub data store
get the related object from CDMS
check if CDMS was updated after the last synchronisation
if so, update the Data Hub object
return the local results

Read and write operations are performed as a single transaction so that changes are rolled back in case of exceptions with CDMS.

The same object on both systems is considered in sync if the modified field value is the same. If the modified value of the CDMS version is more recent, it means that the Data Hub object has to be updated from the CDMS one. If the modified value of the Data Hub version is more recent, an exception is triggered as this should never happen. This is because writes on the Data Hub always generate writes in CDMS but the vice versa is obviously not true.

The possibility of conflicts is low as:

objects on the two systems are kept in sync via the modified field updated after each CDMS get
concurrent operations to a single object are low or non-existent in volume

In case two updates happen at approximately the same time, the last one wins. This should not be a problem as the system keeps a history of the changes.

Limitations¶

There are some limitations in using this approach:

Amount of requests. This has not been measured yet but could (and should) be partially addressed by using some sort of caching strategy

The synchronisation happens using one common CDMS user

Some Django ORM API cannot be easily implemented. E.g. Model.objects.count(), Model.objects.filter(field1__field2='something'). This is mainly because of the old CDMS technologies

It might not be easy to change the Django schema in many cases as the sync layer prefers a one-to-one mapping.