Django Migrations and How to Manage Conflicts
Migrations are one of Django’s most useful features, but for me, personally, it was a dreadful task to take care of model changes. Despite reading the docs, I was still scared of migration conflicts or losing the data or having to manually modify the migration files or this or that. The thing is, migrations are awesome, helpful, and once you understand them, you will have no problems with any of the things mentioned above.
I haven’t found articles or pieces of documentation that provide, in one place, all the methods for fixing conflicts, and since nobody searches the second Google page where you can hide bodies, I will try to shed some light on these matters. Mainly, I will try to explain where you can find the migrations inside your application, how to get out of migration conflicts and a little bit about data migrations. I will assume you have some experience with Django, Python and GIT.
Here is a short definition of migrations, from the Django documentation:
Migrations are Django’s way of propagating changes you make to your models (adding a field, deleting a model, etc.) into your database schema. They’re designed to be mostly automatic, but you’ll need to know when to make migrations, when to run them, and the common problems you might run into.
You can manage the database using only a handful of commands, no matter if you choose PostgreSQL, MySQL or SQLite. I will mostly talk about the makemigrations command, which is responsible for creating new migrations based on the changes you have made to your models, and the migrate command, which is responsible for applying migrations, as well as unapplying them and listing their status.
Where are my migrations?
In your project, you can find your migration files (.py files) inside the migrations folder. This folder must contain an __init__.py file, even if you do not have an initial migration just yet.
In every installed app defined in your settings file, you will find each migration that comes with that app. For instance, you can find the User migration in …/lib/python2.7/site-packages/django/contrib/auth/migrations.
In your database, you can find what migrations have been applied by listing the django_migrations table. This comes in handy when you switched between branches that have different migrations and you forgot which was where.
my_data=# select * from django_migrations; id | app | name | applied ----+---------------+-------------------------+------------------------------- .. | ... | ... | ... 10 | myapp | 0001_initial | 2016-03-17 07:22:30.329448+00 11 | myapp | 0002_auto_20160316_0909 | 2016-03-17 07:22:30.956985+00 12 | myapp | 0003_auto_20160318_1345 | 2016-03-18 13:45:23.895839+00 .. | ... | ... | ... (16 rows) my_data=#
First of all, why keep migrations in my project?
Some of you might say: why not GIT-ignore all the migrations inside a Python/Django project and let each developer create their own migration files? This way we can avoid unnecessary headaches. Well, this is one of the reasons for writing this article.
Before any further digging, I want to make sure that it is clear that migrations must be generated each time you modify your models and must be committed and pushed into the repository, so that all the developers have the same set of migrations. I can’t stress this enough.
Why is this so important? Well, imagine there are no migrations in your project. Each team member, individually, generates the migration files locally. Then, there comes the time when I want my project in production. I deploy, create the initial migration and migrate my database. After some time, I make a model change and I deploy again, thus having to make a new migration on this production server. But what happens if now I want a second production server? I deploy, create the initial migration and migrate my database. So now, I have the first production server, with two migrations, and the second production server, with only one migration. This is a big inconsistency. I cannot keep track of the database changes anymore and I cannot execute any rollbacks. Basically, if I want any database rollbacks now, I have to take care of those on every single one of my production servers.
So please keep the migrations in your project. You might encounter some (manageable) conflicts, but if you want database versioning or you want to be able to rollback the database, make your life as a developer easier by tracking migrations.
Suppose we have a Django project, an initial migration and we use GIT. We use GIT, ergo we use branches. We use branches, ergo we might have migration conflicts when merging the branches. Of course the team should know who is working on what and try to avoid working on the same models. Of course team coordination is important. But one cannot always avoid migrations conflicts. So, do not freak out and drop the database, as I did so many times! They can be easily repaired and in order to do just that, I will explain a few simple, but much useful methods. You can choose the one that fits your needs the best. If you need to brush up on some Git first, you can try reading our “Git tips and tricks” article.
We know that we need an initial migration for our database (Django will compare any model to that initial migration and not to the current state of the database) and we do not manually modify the database.
Say we have a UserProfile model and there are two branches:
- branch_1, where one developer added an “address” field to the UserProfile model, thus having the migration 0003_userprofile_address.py
- branch_2, where I want to add an “age” field to the UserProfile model, thus having the migration 0003_userprofile_age.py
My branch is branch_2 and I want to merge branch_1 into branch_2. After the GIT merge, I have on branch_2 my migrations and also 0003_userprofile_address.py from branch_1. The problem here is that both migrations try to alter the same model and that both migration names start with “0003_”.
There are three possible solutions. The first two solutions I personally recommend, but I advise you to avoid the third one when possible.
Method number 1: use –merge
You may use this method first, anytime. It is easy, since Django handles the merge automatically. Although, if you are a more experienced developer, you will know if this method will fail beforehand, considering that this option is only useful for pretty simple model changes.
So, in order to allow Django to merge the migrations for you, you should follow these steps:
- try executing python manage.py migrate (at this point Django will see that there are conflicts and will tell you to execute python manage.py makemigrations –merge)
- execute python manage.py makemigrations –merge and the migrations will automatically be merged; you will see that a new migration, 0004_merge.py, is created inside the migrations folder
- execute python manage.py migrate
$ python manage.py migrate CommandError: Conflicting migrations detected (0003_userprofile_age, 0003_userprofile_address in myapp). To fix them run 'python manage.py makemigrations --merge' $ python manage.py makemigrations --merge Merging berguiapp Branch 0003_userprofile_age - Add field age to userprofile Branch 0003_userprofile_address - Add field address to userprofile Merging will only work if the operations printed above do not conflict with each other (working on different fields or models) Do you want to merge these migration branches? [y/N] y Created new merge migration .../migrations/0004_merge.py $ python manage.py migrate Operations to perform: Synchronize unmigrated apps: ... Apply all migrations: ... myapp, ... Synchronizing apps without migrations: Creating tables... Installing custom SQL... Installing indexes... Running migrations: Applying myapp.0003_userprofile_age... OK Applying myapp.0003_userprofile_address... OK Applying myapp.0004_merge... OK $
Please note this message “Merging will only work if the operations printed above do not conflict with each other (working on different fields or models)“. If there are complex modifications, then Django will probably not merge your migrations correctly and you will need to apply another method.
Method number 2: rollback and then migrate again
You should choose this method if the first one fails or if you don’t agree much with having too many migration files in your application (even though Django documentation states that there can be any number of migrations in your application).
- rollback to the most recent common migration between the branches, using the command: python manage.py migrate myapp my_most_recent_common_migration
- you can either:
- temporarily remove your migration, execute python manage.py migrate, add again your migration and re-execute python manage.py migrate. Use this case if the migrations refer to different models and you want different migration files for each one of them.
- remove both migrations and execute python manage.py makemigrations and python manage.py migrate, obtaining one migration with the changes from 0003_userprofile_age.py and 0003_userprofile_address.py combined. Use this case if the migrations refer to the same model and you want all current changes into one single migration. Be careful! If you use this case and remove the migration from the other branch, make sure nobody has it or everyone knows you delete it! If you remove it and people use it, you are going to cause them some nasty problems. This should be a general rule: do not modify or remove migrations that other people use!
- rename one of the migrations from “0003_” to “0004_” (this is not mandatory, but you should), modify the dependency attribute to point to the “0003_” migration and execute python manage.py migrate. This case is an alternative for the first one, BUT you have to manually modify the dependency attribute of the migration to point to the latest one. It is a matter of personal taste if you prefer to rollback and recreate the migration or if you prefer to manually modify the migration. I, for one, prefer the first option.
Before executing the commands in the snippet below, I removed both the migrations (so I followed the second use case explained above, considering that both branches were created by me and no one else used the other migration).
$ python manage.py migrate myapp 0002_auto_20160316_0909 Operations to perform: Target specific migration: 0002_auto_20160316_0909, from myapp Running migrations: Unapplying myapp.0003_userprofile_age... OK $ python manage.py makemigrations Migrations for 'myapp': 0003_auto_20160318_1345.py: - Add field address to userprofile - Add field age to userprofile $ python manage.py migrate Operations to perform: Synchronize unmigrated apps: ... Apply all migrations: ... myapp, ... Synchronizing apps without migrations: Creating tables... Installing custom SQL... Installing indexes... Running migrations: Applying myapp.0003_auto_20160318_1345... OK $
You should be careful if the migration from branch_1 contained a change on the same field your migration did. Then, you should decide what change you want to keep and redo your migration.
Method 3: manually modify the migrations
This happens rarely and if you’ve reached this point and there is no other choice for you, read the Django documentation on writing Django migrations. It is very concise and it will take you through the main components of a migration.
Another one of the so many reasons migrations must be part of a project is data migration. If there is a requirement that, for instance, a many-to-many relationship changes into a one-to-many relationship or that a table needs to be split in two different tables, then we apply of course a migration. But what happens with the data? We still need all the data currently in the database. Well, here comes the benefits of data migration. You write a custom migration, a few scripts that handle your data in conjunction with the schema you want and you execute python manage.py migrate afterwards. That’s it! You have the data refactored in the new schema.
What you explicitly need for your data migration is a dependency. This dependency refers to the latest applied migration (its name, to be more exact) and needs to be set by the developer. But if we have different sets of migrations on our production servers, then we need to take care to set the dependency accordingly on every server. This is tedious and being able to avoid it makes the migration mechanism even greater.
Also, there is a difference between a fixture and a data migration. Fixtures are used to load default data into the database, while data migrations are used to change the data in the database itself. A fixture is a JSON file that lays inside a fixtures folder and is loaded using the command python manage.py laoddata myapp/fixtures/my_fixture.json. This way, you have a set of objects that populate by default your database. On the other hand, a data migration is a migration in itself and lays inside the migrations folder. You do not add new entries in the database using a data migration, you only alter the data in the database.
Migrations are a very powerful mechanism for database versioning. You must always consider using migrations when working with Django and databases.
Even if you don’t have a programming background or if you have worked with other frameworks that do not use a similar mechanism, experiment and aim to understand the logic behind them. They are not as complicated as they seem at first and all the problems they cause can be easily solved.