How to group rows by time in PostgreSQL (and in Django)

Sometimes you need to group rows by a time interval, say, one value per minute, or per hour. In this post, we will research how we can do that with vanilla PostgreSQL and with Django ORM.

When do you need it? Imagine you have a table that stores weather temperature measurements and takes a temperature snapshot every 10 seconds. In one hour it will collect 360 measurements. That’s would be a lot if your script runs for several months. Okay, now you want to render temperature measurements as data points on a line chart, and you also want your users to be able to reduce data set by hours, days, and weeks. Here comes this snippet!

Setting up the database

Before we start experimenting with the SQL query let’s set up a database table and fill it with a sample data set. I assume you have already created a database.

create table temperatures (
value real,
measured_at timestamp with time zone
)

I omit indices and primary key here just to simplify the example.

So we have a table. Let’s fill it with sample temperature measurements.

insert into temperatures 
select
random() * 40 + 1,
generate_series(
'2021-05-01 00:00:00'::timestamp,
'2021-05-31 23:59:59'::timestamp,
'10 second'
)

This query will create 267840 measurements. It is a pretty large data set for our example.

Let’s check what we have in the table:

select * from temperatures limit 10

SQL to group by time

Now we want to select only one measurement per hour. We can do this by truncating the date and using the result in the distinct on expression.

select distinct on (key) *, date_trunc('hour', measured_at) as key 
from temperatures
limit 20;

You see that PostgreSQL returned only one value per hour. That is what we want! In the same way, you can group values by week, month, or other period supported by the database.

Use Django ORM to group rows by the time

We can do the same with Django ORM. Assuming we have this model for our table

from django.db import models

class Temperature(models.Model):
value = models.FloatField()
measured_at = models.DateTimeField()

you can write the following DSL to fetch values grouped by the hour (or another period you wish)

from django.db.models.functions import Trunc

Temperature.objects.annotate(key=Trunc('measured_at', 'hour')).distinct('key').order_by('key')

That’s it!

Originally posted in my blog.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store