Snappea: A Simple Task Queue for Python

Klaas van Schelven; August 30, 2024 - 12 min read

Since Bugsink is only available self-hosted (we don’t host it for you), we are obsessed with making installation as easy as possible. In an earlier article, we talked about how we do this by removing as many moving parts as possible.

In this article, we’ll dive deeper into one of the parts we cut out: Celery, the go-to solution for background tasks in Python applications. We replaced it with a simple task queue that we built ourselves, called Snappea.

Are the results generalizable to your use case? Maybe not in the sense that Snappea would be a drop-in replacement for your particular needs. But perhaps you’ll find inspiration in our approach and adapt it to your own requirements. Worst case, you’ll get a glimpse into the inner workings of a custom task queue and learn something new.

Background work

Most web applications have tasks that are too long-running to be completed within the main HTTP request-response loop. These tasks, often referred to as “background jobs” or “asynchronous tasks,” don’t directly impact the response sent to the user but are essential for the application’s functionality.

We’re no different at Bugsink. We have tasks that need to be executed outside the main request-response cycle. The main examples are:

Email Notifications: Sending emails can be slow, especially when dealing with external SMTP servers. We don’t want to make the user wait for an email to be sent, so we handle this in the background.
Event Processing: Bugsink processes various “events” (basically: errors happening in the applications that are being monitored by Bugsink). Fully processing an event is more slow than to just store it on disk and acknowledge that the event has been received. And we want to stay responsive even when there are large spikes in traffic. So in the most robust configuration, we process events in the background.

Gunicorn’s simple model

Gunicorn is the WSGI HTTP server we use for Bugsink. It’s a pre-fork worker model, meaning that it starts multiple worker processes, each handling a single incoming request at a time. The gunicorn master process called arbiter manages these workers, restarting them when they die, when they take too long to respond, or after a certain (configurable) number of requests.

This model is simple and efficient. It allows us to handle many concurrent requests without the complexity of asynchronous programming or event-driven architectures. But it also means that we can’t run background tasks within the same process as the HTTP server.

Couldn’t we just run background tasks in separate threads within the same process, or start a new subprocess for them? I’d say that would violate the design of gunicorn in such an obvious way that it basically guarantees hard-to-debug issues down the line. i.e. it’s a bad idea.

Celery: The mainstream solution

Celery is a popular choice for handling background tasks in Python applications. It’s a distributed task queue that supports various message brokers (like Redis or RabbitMQ). The idea is that you delay a function call, which under the hood sends a message to a queue. Workers then pick up these messages and execute the function calls.

Celery is powerful and flexible, but it also brings a lot of complexity. You need to install a message broker, configure users and permissions for it, and keep it up to date and well connected (if it’s on another host).

I know, I know, “how hard can it be?” Answer: harder than not doing anything at all. Also: all those cost gets multiplied by the number of users who want to self-host Bugsink. So we’re literally saving lives here.

Additionally, Celery’s architecture is more complex than what we need. It supports features like retries, periodic tasks, and returning of results. These features add overhead and potential points of failure that we’d rather avoid. Case in point: despite its role as the go-to solution for background tasks in Python, Celery’s basic functionality was in fact broken for over a year when used with the popular redis backend.

Snappea: overview

The basic ideas behind Snappea are:

SQLite as a message queue: Snappea uses SQLite as a message queue. Tasks are stored in a database table.
There is a single Foreman, running as a separate process from gunicorn, that manages the entire workflow. The Foreman scans the database for new tasks and executes them in worker threads.

And that’s almost it: the implementation detail that’s worth discussing is how we wait for new tasks without busy-waiting. Apart from that, the code deals with handling signals (like SIGINT and SIGTERM) gracefully and logging what it’s doing, but that’s not really worth discussing in detail.

SQLite as a message queue

SQLite is a simple, serverless, self-contained SQL database engine. It’s a perfect fit for our use case: we don’t need the complexity of a full-fledged message broker, and we don’t need to scale to multiple machines.

SQLite is fast and reliable, and it’s included in Python’s standard library. Set up is trivial: the database is just a file (in WAL mode: 3 files) on disk. Bugsink is a Django application, which means we get the full power of Django’s ORM and migrations for managing the database schema. This means that setting up the database is as simple as running:

python manage.py migrate --using=snappea

Another reason we chose SQLite is that we’re already using it for the main database in the recommended production setup. This means we’re already familiar with it and know what the right knobs to turn are.

In the above, the --using=snappea flag tells Django to use the snappea database configuration from settings.py. This is a separate database from the main database used by Bugsink. The snappea database is only used by Snappea, which is important because writing to the database locks the whole database file, and we don’t want to block the main DB on the enqueue operation and vice versa.

We set WAL mode on the snappea database to improve write performance, and to avoid readers blocking writers.

This does come with the disadvantage that the database cannot be put on a network filesystem, but that’s not a problem for us because we run the snappea Foreman on the same machine as the Gunicorn server that enqueues tasks.

The snappea database has a single table, snappea_task, which stores the task name and arguments (serialized as JSON).

Enqueuing a task is as simple as creating a new Task object. We’ve created a tiny decorator named @task for that purpose. When a function that’s decorated with it is called with delay we simply add a new Task object to the database instead of actually calling the function.

The Foreman

The Foreman is a separate process that runs alongside the Gunicorn server. (It’s called bugsink-runsnappea in ps). It’s responsible for scanning the database for new tasks and executing them in worker threads.

The main loop of the Foreman is really quite simple: just keep scanning the database for new tasks, and execute them when they’re found. Because there’s only one Foreman, there’s no need for complicated reasoning about locking between various Foremen.

With some details omitted, the main loop looks like this:

while True:
    # code that implements waiting for new tasks here [...]

    for task in Task.objects.all():
        function = registry[task.task_name]
        args = json.loads(task.args)
        kwargs = json.loads(task.kwargs)

        task.delete()

        self.run_in_thread(task_id, function, *args, **kwargs)

Thread-safety

The Foreman runs tasks in separate threads. This is important because some tasks might be slow (like sending emails), and we don’t want to block the Foreman while waiting for them to complete.

The Foreman itself is single-threaded, so we don’t have to worry about thread-safety issues in the Foreman itself. This leaves the tasks as the only place where we need to worry about thread-safety.

For the tasks themselves there’s not much to worry about: the model is that they are “just background tasks”, and don’t interact with each other.

This leaves a final thing to worry about: the management of the number of running workers, e.g. counting how many are currently active. This is done with a simple semaphore, which is incremented when a new task is started and decremented when it’s finished. This is done with a threading.Semaphore object, which is thread-safe.

Waiting for new tasks

The above code snippet is missing the most important part: how do we wait for new tasks. This is the obvious drawback of using SQLite as a message queue: an actual message queue would have a blocking get operation that would wait until a new message is available, but SQLite doesn’t have that.

The naive approach would be to do a busy-wait, i.e. to poll the database, sleep for a while, and then poll again. This is inefficient two ways: it uses CPU cycles to do nothing, and it introduces latency between the task being enqueued and it being executed. It also forces us to make an unnecessary tradeoff between latency and CPU usage: if we poll too often, we waste CPU cycles, and if we poll too infrequently, we introduce latency.

The solution we came up with is to use inotify. inotify is a Linux API that allows us to watch a file or directory for changes. The inotify API is exposed in Python through the inotify_simple package. The API in that package allows us to do a blocking wait for changes, which is exactly what we need.

Finally, in the task decorator, we write an empty file to a directory that the Foreman is watching. This triggers the inotify event, and the Foreman wakes up and processes the new task. Because the file is always written after the task is added to the database, we don’t have to worry about missing tasks.

Throughput

Scalability is always on our minds at Bugsink: when your systems start misbehaving, you don’t want your error tracking system to fall over as well.

Picking SQLite as a queue mechanism, and designing our system around a single-process setup means we’ll have a a very real ceiling on our throughput. After all: SQLite locks the full database on writes.

In practice, we have found that Snappea is not the bottleneck in our system:

To test the throughput of task-handling, we wrote a simple task that does nothing at all. We then proceeded by enqueueing as many of these tasks as possible while processing them in the background and measuring how long it took to process them all. We found that in this setup we can deal with hundreds of tasks per second.
In our actual setup, the most demanding task is the processing of events. We can process approximately 30 events per second in the background. We’re able to get events to the Foreman (through the network, and SSL-processing) at a rate that’s slightly higher than that (perhaps 50 events per second), leading to a backlog under heavy load.

That is to say: Snappea works as designed here, because it enables us to increase our throughput in spikes of traffic, and to keep the system responsive even when the event-processing is slow. Note that switching the architecture of the Snappea Foreman to use multiple processes instead of threads doesn’t help here: the bottleneck for event-processing is the database (which is locked on writes), not the CPU.

Considerations

Snappea is a custom task queue that we built to handle background tasks in Bugsink. It certainly does what we need it to do. That doesn’t mean there’s no room for improvement or alternatives. Here are some things that we might consider and/or we considered but chose not to do:

Integration with Gunicorn: If Gunicorn had integrated support for background tasks, we would have used that instead of Snappea. This would have made the setup simpler and more robust. Such a thing does not exist yet, but it might in the future
Inotify on the database file itself: We use inotify on a directory that contains a file that we write to. This keeps the wakeup mechanism separate from the DB-as-queue mechanism. Writing this article, I realized that we might just as well use inotify on the database file itself. I haven’t tested that yet though.
Just using files instead of SQLite: SQLite is a full-fledged database engine, and we’re only using it as a message queue. We could just as well use files on disk for this purpose. The main reason we did not seriously consider this is that SQLite is already fast enough (by a factor 10) for our purposes, and it does come with the advantage of being easy to query and manage, as well as being robust.
Make inotify optional: inotify is a Linux-specific API. If we want to support other operating systems, we need to come up with an alternative. This could be as simple as a sleep-based polling mechanism. Although that would introduce the latency issues mentioned earlier, it would at least make Snappea work on non-Linux systems.
Using Python’s asyncio and async/await syntax: these tools are presumably the future for dealing with asynchronous programming in Python. . However, they do require you to rewrite all your code to be asynchronous, which is a huge change and in my opinion usually not a change for the better in terms of readability.

ZeroMQ

A final note on alternatives that deserves its own section: we considered using ZeroMQ as a message queue.

ZeroMQ is a message queue that’s written in C and has bindings for many languages, including Python. It’s fast and reliable, and it supports many different messaging patterns. It’s also zero-config, which is our key design goal.

I did not chose ZeroMQ for two reasons:

It requires the 2 processes that communicate to both be running when the communication starts. This introduces a “moving part” that we don’t have with the current setup. That is because SQLite implicitly functions as a message mailbox, because it is a file on disk that can be written to and read from by different processes, whether their counterpart is running or not.
It introduces communication over network ports; even if it’s just on localhost, this is a potential point of failure in the system (e.g. if there is a firewall blocking the port).

One path that we might consider in the future is to use ZeroMQ as the wakeup mechanism for the Foreman, and SQLite as the message queue. In this setup we would retain the “mailbox” property of SQLite, while using ZeroMQ to avoid the dependency on inotify.

The code

The full code for Snappea is available on GitHub. Note that as it stands, it’s basically just the code as we extracted it from the Bugsink codebase. Whether it’s useful to you depends on your requirements, but at the very least it might be a good starting point for building your own task queue.

Conclusion

Snappea is a custom task queue that we built to handle background tasks in Bugsink. It’s a simple and efficient solution that doesn’t require any external dependencies. It’s not a drop-in replacement for Celery, but it might be a good fit for your use case if you’re looking for a lightweight and easy-to-use task queue.