Introducing Search

Klaas van Schelven
Klaas van Schelven; March 13 - 5 min read
A magnifying glass hovering over a number of 'tags'

Bugsink 1.4 brings major improvements to search, making it easier to find relevant errors quickly.

TL;DR:

  • Support for searching both Issues and individual Events.
  • Search is built entirely on tags (both user-supplied and deduced from event properties).
  • Simple query language: key:value pairs for structured filtering.
  • Search is implemented directly in the database, ensuring a simple and efficient architecture.

Background

When Bugsink was first launched, it had basic search functionality, but it was limited to Issues and only covered exception type and value. This worked well for many users.

As more teams adopted Bugsink, new usage patterns emerged that called for a more powerful search feature. One such pattern is: collecting large volumes of errors and sifting through them later. In this scenario, the first step isn’t investigating a specific error – it’s finding the relevant issue or event in a sea of data to begin with.

Another pattern that was not well-supported was debugging across environments. It was perfectly possible to send errors from multiple environments to Bugsink, but those would end up in the same bucket, making it hard to distinguish between them.

The new search functionality addresses these and other use cases, providing a flexible way to locate errors. As a happy side-effect, it also makes it easier to explore the data you already have by providing insights in the distribution of tags across a single issue’s events.

Example Searches

Before we dive into the implementation details, let’s look at some examples of how the new search feature can be used. Here are some queries you can now run in Bugsink:

  • Look up a specific request by trace ID: trace:1f2d3e4f5a6b5c8df9e0a1b2c3d4e5f.
  • Filter errors by environment to compare production vs. staging: environment:production.
  • Find all unhandled exceptions in a recent release: release:v2.3.1 handled:false.
  • Combine a filter by environment with a “contains” match on the Exception Type: environment:production ValueError.

These examples show how Bugsink’s search works: queries use key-value pairs for structured filtering, but free text can also be used to match exception type and value. Next, we’ll break down the query language and its rules.

Search Query Language Syntax

Bugsink employs a straightforward query language to facilitate searches. Users can input queries in the format key:value to filter results based on specific tags. (Use double quotes in the value to search for phrases.)

For example, to find events associated with a particular release, one might use release:1.0.0. This minimalistic approach ensures that users can perform precise searches without the need to learn complex syntax. From the perspective of Bugsink, it allows us to keep the search implementation simple and efficient while keeping the door open to future expansions.

The actual syntax of the query language is implemented with (basically) 2 regular expressions:

  1. Key-value pair: (\S+):([^\s"]+)
  2. Quoted pair:: (\S+):"([^"]+)"

Everything that’s not matched by these two regexes is considered a free-text search, and is matched against the exception type and value.

At its core, Bugsink’s search is built around tags: structured key-value pairs attached to events and issues. These tags allow for flexible filtering without requiring a full-text search engine. Tags are kept in the database, in a many-to-many relationship with events and issues.

This raises the question: where do these tags come from? There are basically two kinds: those explicitly set by the SDK, and those that Bugsink itself deduces from the event data, and then explicitly stores.

  1. User-supplied tags: Most SDKs offer the ability to add arbitrary tags to event as key/value through a function like set_tag; these tags show up in the event-data as a top-level "tags" attribute, which is then picked up by Bugsink.

  2. Deduced tags: Bugsink automatically extracts relevant metadata from events into tags: examples are information like browser name, OS version, or whether an exception was handled, but also top-level event-attributes like release, environment, or server_name. (The full list of such tags can be glanced from the source code if you’re curious.)

This combination of user-supplied and deduced tags provides a rich set of filters for search, making it easy to find specific events or issues.

Issues & Events

Before Bugsink 1.4, search only worked for Issues, and only on exception type and value. The tag-based search introduced in this version extends search to Events as well.

Event search is available directly from any of the “issue detail” pages, such as the Stacktrace view, Event details page, or Issue’s event list. For the event list, the search bar at the top of the page simply restricts the list to events matching the query. For any of the pages that show information for a single event (such as the stacktrace view or the event details view), the search bar will make it so that the event that’s shown matches the given search query, and the navigation buttons will let you move between events that match the query.

The search query is preserved when navigating between issues and events, so you can easily switch between the two views without losing your place.

In a typical workflow, you might start by searching for “Issues with property X”. This query will return a list of issues that have at least one event with a tag property:X. Clicking on an issue will show only the events that contributed to that issue and have the tag property:X. This way, you can drill down into the data without having to re-enter the search criteria.

Design Choices

The search functionality in Bugsink is built directly on the database. Tag keys, values, and their relationships to issues and events are stored in the database, and search queries are simply SQL queries that filter based on these tags.

We’re fully aware that “more fancy” solutions exist, like building on top of Lucene, or integrating with Elastic, Clickhouse, or other search engines.

Doing that, however, would fly straight in the face of Bugsink’s design principles. Bugsink is meant to be simple to deploy and maintain, and adding a separate search engine would introduce a lot of complexity and maintenance overhead.

The choice for a simple architecture ensures that users can set up and run Bugsink without the need for extensive infrastructure, embodying our commitment to a “deploy once, and then forget it” experience.

Performance impact

Both the performance effects of the new search interface on digestion, and the performance of the search queries themselves have been extensively benchmarked. The short of it is:

  • Search queries on tags are fast, i.e., even when searching over hundreds of thousands of events the 5s limit is rarely reached. Note that the 5s limit is the maximum time a query is allowed to run before being killed by the server.

  • Searching on Exception Type and Value (using “contains” semantincs) is still supported, but since such searches are not indexed, they will be slow on large datasets. A future version of Bugsink will introduce either a full-text search or some other mechanism to speed up these queries.

  • Event digestion speed is affected by the new search functionality. Unfortunately, tagging events and issues on-ingest represents actual work, and that comes at a price. The exact impact will depend on the number of tags and events, and the complexity of the queries. But initial benchmarks do show a significat impact: approximately 30% slower digestion. (On the test environment, digestion speed went from 40 events/s to 27 events/s.)

All benchmarks have been done against sqlite3, which is still the default database backend for Bugsink.

Conclusion

The new search functionality in Bugsink 1.4 is a significant step forward in making it easier to find and explore errors. By leveraging tags, Bugsink provides a flexible way to filter and search for issues and events, making it easier to locate specific errors and gain insights into your data.

We hope you enjoy the new search feature, and we’re looking forward to hearing your feedback. If you have any questions or suggestions, please don’t hesitate to reach out to us on GitHub or Discord.