Handling Files in Django is pretty easy: you can add them to a model with only a line (for a brush-up on Django models, you can check out our article on handling data web frameworks), and the framework will handle everything for you – validations, uploading, type checking. Even serving them takes very little effort.

However, there is one thing that Django no longer does starting with version 1.3: automatically deleting files from a model when the instance is deleted.
There are good reasons for which this decision was made: in certain cases (such as rolled-back transactions or cases when a file was being referenced from multiple models) this behaviour was prone to data loss.

Nowadays, almost everyone uses AWS S3, or Google Cloud Storage, or MS Azure, or one of the many cloud-based existing solutions for storing media files without all the hassle and without having to worry that you will one day run out of space. So why even care about the fact that Django doesn’t delete files that are not used anymore? Well, first off, not everyone uses “the cloud” as a storage space for their files (maybe for security concerns or maybe just because they don’t want to). Secondly, those who do use cloud-based storage know that even though theoretically there is no size limit, the costs can become quite large by not deleting unused files.

So let’s dive right in and see which are the possible solutions for removing those nasty unused files.

1. Creating a custom management command


This first solution is actually the one being suggested in the Django documentation (see link above). This involves writing a custom management command which goes through the media files tree and checks, for each file, whether it is still being referenced from the database. Once all has been written and tested, you can schedule the command to run on a regular basis, using cron or celery.

The algorithm is quite simple and consists of four steps:

  1. We search for references to media files in the database — these will be stored in a set.
  2. We recursively create another set which comprises all physical files in the MEDIA_ROOT directory.
  3. The difference between these sets represents files that are physically present, but are not referenced from the database — these are the files we will delete.
  4. In order for our cleanup to be complete, we traverse once again recursively and delete all empty directories.

Now let’s see the code in action:

2. Using signals


This is my favourite way of doing it, because it provides more control than the previous solution. We have used Django signals before and wrote about it on this blog. However, in regards of using signals for deleting unused media files, the comparison (with advantages and disadvantages) will be left for the end of this article.

There are two cases in which we will want to delete a file:

  1. When the model instance to which the file belongs is deleted – here we can simply use the post_delete signal, which will ensure that the instance has already been deleted from the database successfully. The code for this part is pretty straightforward:
  2. When a file is being replaced – in this case we must delete the old file and keep the new one if everything is successful. The simplest way to do it would be in the pre_save signal, when we can recover the value of the old file from the database. However, if any errors appear during the instance save, the file will be forever lost. So we have to do it in the post_save signal, once we know that everything is fine and that the instance was successfully saved in the database. But this also has a big caveat, since in the post_save signal we no longer have access to the previous values of the file field, meaning we no longer know which file(s) to delete. The final solution is to use the pre_save method to memorise the old value, and to actually perform the deletion in the post_save method. We will use a temporary cache on the model to keep the old values:

In case you are wondering “Why hasn’t anyone made a library out of this?”, they actually did. In fact, you can find several solutions which delete a file once it is no longer used, such as django-cleanup. It is up to you to decide what is best for your project.

Comparison


There are other ways to delete orphan files with Django which are not presented in this article. For example, if you know for sure you will only have so little file fields in your project, you may want to choose a more individualistic approach. Or, perhaps, you want to counter some of the effects that deleting these files has and move them to a temporary storage before permanently deleting them.

For now, let’s see how these methods work compared to each other, by enumerating their pluses and minuses:

Management command

+ A custom management command will only run ever so often and it can do so asynchronously. Hence, this solution can result in an overall better performance, since it doesn’t intervene in the request-response cycle.
+ If executed manually — meaning not inside a cron job — this could help preventing the loss of files caused by migrations/transactions.
+ By checking everything in the database, we make sure that no file is deleted if at least one mention to it exists (this takes care of the problem with multiple references).

Customizing management commands is slightly more difficult, and passing those arguments to a cron job is rather ugly.
If the command is not executed often enough, you could still run into storage size problems.
This does not take care of different storage spaces (at least not in the current implementation).
Running the command depends on the database size and on the media folder size, which can quickly become problematic once they increase.

Signals

+ Having everything implemented through signals allows for a high degree of control over the files being replaced: you can easily add extra logic which helps decide whether a file should be deleted or not (e.g. based on user account type).
+ This solution integrates nicely in the application flow and it doesn’t take too long to run, since everything is done on the spot. At the same time, it is much easier to implement for most programmers which are already accustomed to using Django signals.

Even though we took care to handle file replacement in the post_save signal, there is still the possibility to have some errors after this signal is handled, which would result in an ‘unsuccessful’ save.
It does not account for multiple references to the same file (even though, unless you are not directly modifying your database, this should never happen).

Conclusion


Before trying to implement a mechanism for deleting unused media files, you should always:

  • consider why the Django team decided to remove this feature in the first place
  • check if the solution you choose doesn’t introduce more problems than it solves
  • ensure no unwanted data-loss is possible (test your code!)

It is up to you to see whether or not you need this behaviour and what is the best way for you to implement it. In this article, we are happy to have presented you with some possible solutions and hopefully this will be of help to some of our readers. In the mean time, we would love to hear your opinions/questions/suggestions so don’t hesitate to contact us using the comment section. Feel free to check out our other Python articles, including our guidelines for solving Django migration conflicts.

If you enjoyed this article please take a moment and share it with your friends: