Splitting a live Django application in two

A house being split in two

We recently simplified an application that had grown a little too big for its own good. What started off as a single text field on a data model had grown into a separate model, then eventually a series of models and views. We eventually found ourselves working on a monolithic application, with the majority of our code living in a couple of massive files.

Refactoring this on an app that has not yet been released is pretty straightforward, but it gets much more tricky when the site is live and has significant traffic. We wanted a solution that was automated and easy to test so we could be totally sure that the migration would work flawlessly on our production site.

We were already familiar with how Django's migation system makes it easy to changes to data models, but it turns out it's also great at more complex data migrations as well. We wanted to automate the data migration process so that everything would be migrated in one command; Django migrations are executed sequentially, can be automatically run on deployment, can be rolled back if there are failures, and know when each step is completed, which makes them ideal for more complex refactoring.

Splitting the code up

We started by creating a new Django app and moving the relevant models.py, views.py, urls.py, admin.py, forms.py and other related code in the new app. We then updated all references to point to the new locations. Many IDEs have built-in refactoring tools that handle the process of renaming import statements for us. Pretty straightforward so far!

If you are working with an app that is under development, you are pretty much ready to go at this point. Simply run manage.py makemigrations to create new model migrations, and you should be ready to go.

Unfortunately, it's a little more complicated with a live app.

Moving live data models to a new app manually

If you are comfortable with manually migrating the data, you can take advantage of Django's built-in dumpdata and loaddata management commands to copy over data from the old modesl to the new. This works well if you are comfortable taking the website offline for a while, running dumpdata before the migrations are run, and then loading the new data post-migration.

We decided against this approach because of the un-automated nature of this process. Instead, we wanted something we could run multiple times on our local and staging servers until we were confident that the migration would work perfectly.

Moving live data with data migrations

We handled the migration over a series of data migrations steps. Here are the steps we took:

1. Create a copy of each model you are migrating in the new app

Contrary to what might seem logical, you shouldn't delete models from the old app immediately. Instead, create a copy of those models into your new app and run the initial data migration. You should now have two identical models - one in the old app and one in the new.

2. Copy data into the new models via data migrations

Next, create an empty migration file using the command makemigrations --empty <new_app_name> management command; this will hold the code that transfers data from the old models to the new ones. This data migration should iterate through the instances of the old data model and create them in the new model:

# 000x_data_migration.py

def copy_data(apps, schema_editor):
    OldModel = apps.get_model('old_app', 'OldModel')
    NewModel = apps.get_model('new_app', 'NewModel')
    for old_model_instance in OldModel.objects.all():
        new_model_instance = NewModel(
            field1=old_model_instance.field1,
            field2=old_model_instance.field2,
            ...
        )
        new_model_instance.save()

class Migration(migrations.Migration):
    # ...
    operations = [
        migrations.RunPython(copy_data),
    ]

3. Migrate references to the new models

Your application may have models that reference the models you are moving to the new app. If so, create an additional field on those models that reference the new models:

class DependantModel(models.Model):
    foreign_key_field = models.ForeignKey(OldModel)
    
    # new field to copy data from old model
    new_foreign_key_field = models.ForeignKey(NewModel)

Next, run run makemigrations <dependant_app_name> to create the migration file for the new field. Within this migration file, add a data migration function that transfers data from the old model to the new one:

# dependant app migrations
from django.db import migrations, models

def copy_data(apps, schema_editor):
    DependantModel = apps.get_model('dependant_app', 'DependantModel')
    DependantModel.objects.update(new_foreign_key_field=F('foreign_key_field'))

class Migration(migrations.Migration):
    # ...
    operations = [
        migrations.RunPython(copy_data),
    ]

After data is migrated, remove the old foreign key field and rename the new field to match the old field's name. This ensures continuity for dependent apps using the same field name:

class Migration(migrations.Migration):
    operations = [
        migrations.RunPython(copy_data),
        migrations.RemoveField(
            model_name='dependant_model',
            name='foreign_key_field',
        ),
        migrations.RenameField(
            model_name='dependant_model',
            old_name='new_foreign_key_field',
            new_name='foreign_key_field',
        ),
    ]

4. Create migrations to remove the old models

Remove old models from the old app, then create a migration file to delete the old models using the command makemigrations <old_app_name>.

5. Apply migrations

Apply the migrations using the command python manage.py migrate.

Voila, you have automated the process of creating the new data models, copying over the data to them, and deleting the old models. The migrate command should handle all of this for you, in a way that is easy to test and repeat on production.

Testing your migrations

The big advantage of this approach is that it makes it much easier to validate that your migrations are working correctly, as there is no human involvement in the migration process and the behaviour should work the exact same on staging and production. In addition to manually testing the application post-migration, there are a couple additional steps you can take to ensure that the data was copied correctly:

  • Use assertions in the data migration to ensure that the data was copied correctly. This allows you to break the migration flow prior to deletion commands being run if data has not been propertly copied over correctly
  • Check the data in the new models to ensure that it matches the data in the old models.
  • Check the dependant apps to ensure that the data was copied correctly and that the application is working as expected.

It's always a good idea to have a backup of the database before running any data migration so that you can restore the it if anything goes wrong. You should also run the migration commands on a staging server that has a recent copy of your production database so you are not caught out by irregularities with real data.

Complex data fields

DateTimeFields

If you have any kind of auto generated time stamps in your old models (like created_at with auto_now_add=True and updated_at with auto_now=True), and you want to keep them as is without getting them updated, you need to make those fields nullable by adding null=True and blank=True to the fields in the new models.

Once you have copied the data from the old models to the new models, you can remove the null=True and blank=True from the fields in the new models and add the auto_now_add=True and auto_now=True to the fields in the new models.

ManyToManyFields

ManyToManyFields are a bit tricky to handle. ManyToManyFields are represented by an intermediate table that has a foreign key to both the models that are related. When you copy the data from the old models to the new models, this data is not copied by default; when you remove the old models, the data in the intermediate table will be removed as well.

To handle this, you can get access to the through model of the newly created ManyToManyField and copy the data from the old models to the new models in the through model as well.

# 0002_copy_data.py
ThroughModel = NewModel.m2m_field.through
for old_model_instance in OldModel.objects.all():
    new_model_instance = NewModel(
        field1=old_model_instance.field1,
        field2=old_model_instance.field2,
        ...
    )
    new_model_instance.save()
    for m2m_instance in old_model_instance.m2m_field.all():
        ThroughModel.objects.create(
            new_model=new_model_instance,
            m2m_field=m2m_instance
        )

FileFields

If you have any FileFields in your old models, you need to copy the files from the old models to the new models as well. One way to do this is to use the Django FileField's save method to copy the files from the old models to the new models. Note that this will use the file path of the old models to name the files in the new models. If you want to change this, use os.path.basename to get the file name and use it to save the file in the new models.

# 0002_copy_data.py
for old_model_instance in OldModel.objects.all():
    new_model_instance = NewModel(
        field1=old_model_instance.field1,
        field2=old_model_instance.field2,
        ...
    )
    new_model_instance.save()
    if old_model_instance.file_field:
        new_model_instance.file_field.save(
            os.path.basename(old_model_instance.file_field.name),
            old_model_instance.file_field
        )

Auto incrementing IDs

One thing to keep in mind is that the ID sequence of the new models will be different from the old models. To solve this, you can use the setval function in PostgreSQL to set the sequence of the new models to the maximum ID of the old models.

# 0002_copy_data.py
...
migrations.RunSQL('SELECT setval(\'new_app_newmodel_id_seq\', (SELECT MAX(id) FROM old_app_oldmodel));'),

Happy migrating!

It can be a bit of a headache to build a migration plan for a live site, but watching it roll out is worth the hassle. We were able to test the migration dozens of times with production data on our staging servers and verify that everything was working properly before the live migration.

When it came time to run the migration on production, we put the site into maintenance mode for a few minutes, merged the migration into the main branch, and watched a the models and data were migrated and copied over in sequence.

In the end, Django's migrations system allowed us to make a stressful and complicated process as routine as any other code deployment.

Authors: Sepehr Sobhani and Jean-Marc Skopek

Posted on