Using the Postgres bytea Type with Django
I'm not someone who spends a lot of time or energy on digital images and photography. I'm usually set with my phone's camera app and maybe applying a filter as I upload to Instagram. But when you work at a database company like Crunchy Data, anything in your life can inspire a demo application. How about a simple image upload app built with Django 3.1 and backed by PostgreSQL 13, that takes advantage of the PL/Python procedural language for processing?
You may have seen some of our DevRel team's blog posts earlier this year related to learning Django. We'd also started dipping our toes into PL/Python. Another topic we wanted to explore is how binary files can be saved and retrieved from a PostgreSQL database. We thought it might be neat to build something that would push us to learn even more about these tools. So we decided to try creating a Django app that allows you to upload images to Postgres and do basic image manipulation.
In this post, I'm going to walk you through the upload feature and how files save as raw binary data. Watch out for a follow-up post on how we process that data with PL/Python.
Storing files in Postgres
You've likely encountered this question yourself: how should I store files in Postgres? Should they even be stored in the database in the first place? Some people prefer to store the actual file in the filesystem, and then store a reference to the file in a Postgres data table.
The options for storing binary files in a Postgres database are: using the bytea data type, encoding the binary yourself into text, or using large objects. bytea stores binary strings (a sequence of 8-bit bytes), while large objects in Postgres are stored in a special data structure. Storing binaries in the database means that they can also be handled with transaction processing, which might be essential for some. To dive in a bit more into this question, you can start with the PostgreSQL Wiki on Storing Binary Files.
For our demo app, we're storing the raw image data with bytea. If you're using a hosted Postgres environment (like Crunchy Bridge), this could be the way to go for saving binaries since you likely won't have access privileges to the filesystem.
I'll also put out another disclaimer that we're running this demo app locally, with a limited number of known users. We're mostly just interested in working with the binary data in Postgres. So if we had to build something production-ready we'd very likely make different decisions than the ones I’ll describe. More on that as we go.
Set up Django data models
We've named the project imageapp, which houses an image_process
application. In Postgres, we've created a database also named image_process
.
We've got our database backend already set up, so the next thing we did was to create a data model for the image files. If you've read our previous blog posts on Django, you know that it's possible to create the database schema first in PostgreSQL, then get Django to generate the corresponding data models. This time around, we just want to get started by having Django create the database objects for us, so we'll add this to our app's models.py
:
image_process/models.py
from django.db import models
class ImageFile(models.Model):
image = models.FileField()
image_data = models.BinaryField(null=True)
The ImageFile model is going to represent an image that's uploaded through the app and stored in the database. For now, we have image and image_data
as the model's attributes. We went with the FileField type which is a more general-use field type for file uploads, but ImageField
is another supported type. ImageField
does require the Pillow library, but it also checks that the uploaded file as a valid image. After this series is published I'm pretty certain I'll go back and try out ImageField
.
The image_data
attribute is a BinaryField
which will store the raw binary data from the image. Why do we have two separate attributes? For one, BinaryField
stores only raw binary data. This means that we'll need additional fields anyway to store the file name, or path, or any other metadata. Also, BinaryField
can't be used in a ModelForm
, which we did want to take advantage of. So we're using FileField
to handle the file upload via the form, and then in a view function we'll figure out how to get the binary data saved under image_data
.
The other thing to note about FileField is that the uploaded file will be saved locally. This means that in addition to the binary data being saved in the database, at this point we're also getting the image files saved to the local filesystem. Far from ideal, but not a dealbreaker for our purposes, so we'll keep going.
Run a database migration
Since we're letting Django handle the corresponding update to our database, we'll run a migration after setting up the model:
python manage.py makemigrations
python manage.py migrate
That should generate a new table in Postgres to store image data; we’ll check with psql:
image_process=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------------------------+-------+------------
...
public | image_process_imagefile | table | image_user
image_process=# \d image_process_imagefile
Table "public.image_process_imagefile"
Column | Type | Collation | Nullable | Default
------------+------------------------+-----------+----------+-----------------------------------------------------
id | integer | | not null | nextval('image_process_imagefile_id_seq'::regclass)
image | character varying(100) | | not null |
image_data | bytea | | |
Indexes:
"image_process_imagefile_pkey" PRIMARY KEY, btree (id)
We see that FileField
maps to a varchar column in Postgres, while BinaryField
maps to a bytea column. (An id
column is also created even if it's not in the model definition.)
Define the form
Like I mentioned earlier, we're using the ModelForm
helper class, so setting this up is relatively straightforward. Remember that we can't include the BinaryField
here -- we'll deal with it later in the view.
image_process/forms.py
from django.forms import ModelForm
from image_process.models import ImageFile
class UploadImageForm(ModelForm):
class Meta:
model = ImageFile
fields = ['image']
All we need to get a form to show up in the browser is a template that contains this block of HTML:
image_process/templates/image_process/upload.html
{% raw %}
<form action="" enctype="multipart/form-data" method="post">
{% csrf_token %}
<table>
{{ form.as_p }}
</table>
<input type="submit" value="Upload" />
</form>
{% endraw %}
And you get a form with a file upload widget (which you can of course style and customize to your heart's content):
Process raw data in the view
The view is going to be where we'll tell Django to save the raw binary data of the uploaded file to Postgres. Here's what we came up with:
image_process/views.py
from django.shortcuts import render, redirect
from .forms import UploadImageForm
from .models import ImageFile
def upload_file(request):
if request.method == 'POST':
form = UploadImageForm(request.POST, request.FILES)
if form.is_valid():
uploaded_img = form.save(commit=False)
uploaded_img.image_data = form.cleaned_data['image'].file.read()
uploaded_img.save()
return redirect('/')
else:
form = UploadImageForm()
return render(request, 'image_process/upload.html', {'form': form})
Stepping through the view:
When we hit "Upload" on the form, we bind the uploaded file data (in
request.FILES
) to the form.If the form has valid data, the data is placed in the form's
cleaned_data
attribute. In this case, the uploaded file data can be accessed with the name of the form field (image
) as a dictionary key. The dictionary value itself is an object that has a method for reading the raw bytes of the file.The
form.save()
method creates a newImageFile
object from the data bound to the form. The.save()
method also accepts an optional commit parameter.When we first call the method, we're creating a new ImageFile instance. But since we're including commit, we're not sending it to the database yet because the form as it's set up will populate only the
image
attribute.So, we have to explicitly read the bytes data from the uploaded image into the
image_data
attribute ofImageFile
. Only then are we saving the new instance to the database, by calling the.save()
method a second time without the optional parameter.
This is just one example of how you might handle uploaded files. Check out the official Django docs on file uploads to learn about the default behavior and how it can be customized. For example, you can retrieve the file data from request.FILES
instead of the validated data in cleaned_data
. Django also highly recommends reading uploaded file data in chunks so as not to overwhelm the system, especially with large files.
There are many ways to implement image uploads with Django: you might look into just using the basic Form
class, or perhaps you'll want to use the class-based FormView
. But like I mentioned earlier, we're just interested in quickly seeing how we can process and save the raw data, so we're OK with this view for now.
How image data saves in Postgres
In our data table, the image column contains a reference to where the file is stored (in our case it's just the name of the file). image_data
will hold a binary string that starts like this:
\xffd8ffe000104a46494600010101012c012c0000…
I want to point out again here that Django doesn't recommend storing the actual files as database entries in their official documentation for BinaryField
.
And that's basically it for Part 1. We'll talk more about retrieving and using the saved raw data in PL/Python in the next post.
Again, here's the link to the Postgres Wiki article so you can make sure you're aware of the different options (with caveats) for storing binary files. And to sum everything up real quick:
- Understand why you may or may not want to store a file in the database and not the filesystem
- If you need to take this route: Django has a
BinaryField
type that maps to the bytea type in Postgres for storing binary data - For file uploads, you can access the raw binary data in the view and do something with it. It may depend on how exactly your data model and form are set up, but this is likely where you'll tell Django how to save the raw data to the bytea column in your Postgres table. (And you have the power to customize how the upload should be handled.)
We hope this example is helpful! If you're a Django user and also have to save files in the database, how would you implement this? Let us know in the comments below.