Storing Events Outside the Database in Bugsink

Klaas van Schelven; Febuary 19 - 5 min read

Bugsink stores events verbatim, preserving all details for later debugging and analysis. Until now, these events were always stored directly in the database. This approach kept things simple and self-contained, making it easy to set up, back up, and query event data. For small and medium-sized installations, this was fine.

However, one of our customers wanted to store millions of events while keeping Bugsink responsive and manageable. This might sound like a lot, but it’s not an unreasonable request: if you run Bugsink “at capacity” by ingesting millions of events per day, you’ll quickly accumulate a large amount of data. To even be able to look back five days, you’ll need to store 10 million events.

To get a sense of the scale: with event sizes averaging 50 KB, storing a million events would take ~50 GB. Storing 5 days of events would take ~500 GB. SQLite actually deals with this surprisingly well, but there are still secondary effects to consider:

Physical storage limits – having everything in a single file means you’ll need a single disk with enough space. Although physical storage limits are less of a concern with modern hardware, cloud providers often have limits on disk sizes, and “prepackaged offerings” tie storage, compute and memory together, thus requiring an upgrade overall.
Backup and restore complexity – Large databases slow down backups and increase downtime when restoring; even just copying the database file can take a while, and operations like rsync (or some equivalent) must take into account all your data, rather than being able to work incrementally.
Database Migrations – SQLite migrations involve full table copies (because SQLite doesn’t support ALTER TABLE; when all your data is in a single table, this can be a slow operation. Also: while this table copy is happening, you’ll need disk space for the original table and the new table, which means you need to always provision for a factor 2 of your database’s size.

Note that one thing is actually not a concern: performance of writes. You might also think that writing events to disk is faster than writing to a database, but, at least fot the case of SQLite, I have not been able to measure a significant difference.

(I have not tested the performance difference in the case of a MySQL/PostgreSQL database: it could very well be that the difference there is significant, because these are client-server databases, and writing to a file is a much simpler operation than writing to a networked database.)

Introducing External Event Storage

To address these concerns, Bugsink 1.3 will introduce an optional way to store events outside the database. Instead of keeping full event JSON in the main database, Bugsink now allows for alternative storage backends. The system keeps key information about the events in the database, ensuring they remain fully queryable while offloading the bulk of the data elsewhere.

Configuration is quite straightforward:

Configuration in Docker:

FILE_EVENT_STORAGE_PATH: Path where the events will be stored. This is a path inside the container, so you’ll probably want to mount a volume to this path.
FILE_EVENT_STORAGE_USE_FOR_WRITE: If set to true, Bugsink will write new events to the file storage. i.e. this setting is non-optional if you actually want to use the file storage.

Configuration in `bugsink_conf.py`:

BUGSINK = {
    # ... other settings here

    "EVENT_STORAGES": {
        "local_flat_files": {  # a meaningful name of your choosing
            "STORAGE": "events.storage.FileEventStorage",
            "OPTIONS": {
                "basepath": "/some/path/to/store/events",
            },
            "USE_FOR_WRITE": True,
        },
    },
}

Configuration, the Why

You’ll note that the configuration requires basically 2 pieces of information: the path to write to, and whether to write to this path at all.

This last bit is there so that you can switch between storage mechanisms “in flight”, i.e. without having to stop Bugsink to run a migration. This is useful because the scenario of migrating to external storage is basically the expected one: starting as simple as possible, and only moving the data out of the database when you need to.

You’ll also note that the "EVENT_STORAGES" setting is a dictionary, which means you can have multiple storage backends configured at the same time. Again, the implication is: 1 backend for writing of new events, and 1 or more backend configurations that describe how to read events that are already stored. Such a setup also takes into account the possibility of introducing new kinds of storage backends (such as S3 or other cloud storage providers) in the future.

Migrating to External Storage

To Migrate to external storage, all you need to do is set up the configuration as described above. Bugsink will automatically start writing new events to the external storage.

You may want to move existing events to the external storage as well. This is done with the migrate_to_current_eventstore management command. This command will copy all events from the database to the external storage one by one. But as mentioned above, this is not strictly necessary, and you can do it at your leisure.

There’s also a cleanup_eventstorage management command that will remove stale or redundant event data from the database. In theory, running this command should not be necessary, because Bugsink cleans up after itself, but it’s there just in case.

Conclusion

Storing events outside the database is a big step for Bugsink. It allows for scalability and manageability that were previously out of reach. It also opens the door for future storage backends, such as S3 or other cloud storage providers.

We’re excited to see how this feature will be used in the wild, and we’re looking forward to hearing your feedback!