Grouping Connection-Errors

While working on Bugsink, we definitely “eat our own dogfood”. That is: we use Bugsink to track problems in Bugsink. This means we use the Python Sentry SDK to track errors in our own code.

One of the issues we encountered was that the SDK was not grouping connection-related errors correctly. The problem is that when you have a connection issue – like a database going down or becoming unreachable – these errors can appear anywhere in your code. That’s simply caused by the fact that connecting to the DB will be done all over the place in a typical DB-heavy application. Similarly, if you have a problem with a RabbitMQ or Redis connection, you might see errors in different parts of your codebase, because background work is enqueueed all over the place.

Each of these errors will have a different stacktrace, and will be reported as a separate issue. This leads to multiple, ungrouped issues that all stem from the same root cause. Moreover, the root cause isn’t even a problem in your code, but rather an infrastructure issue, which makes it even more annoying.

Sentry’s own documentation highlights this limitation:

For example, if a generic error, such as a database connection error, has many different stack traces and never groups them together,

It does provide a rought sketch of how to solve this problem using fingerprints, but it’s not very detailed. So we decided to write a more detailed guide on how to solve this problem. The code snippet we provide has solutions for database connection errors, RabbitMQ connection errors, and Redis connection errors built in.

How to Do It

Here’s how you can implement custom fingerprinting in Sentry’s Python SDK:

# Note: the excessive string-matching in the below is intentional:
# I'd rather have our error-handling code as simple as possible
# instead of relying on all kinds of imports of Exception classes.
def _name(type_):
    try:
        return type_.__module__ + "." + type_.__name__
    except Exception:
        try:
            return type_.__name__
        except Exception:
            return "unknown"


def fingerprint_exc(event, exc_info):
    type_name = _name(exc_info[0])
    exc = exc_info[1]

    if type_name == "builtins.SystemExit":
        # This could happen at any time; it typically means "a part
        # of your application is so slow it is being killed".
        # Whether such problems are individual issues or not is a
        # matter of taste.
        event['fingerprint'] = ['systemexit']

    elif type_name in ["django.db.utils.OperationalError"] \
            and '1366' not in str(exc):
        # '1366, "Incorrect string value' is the single
        # OperationalError that I has seen so far that is not a
        # connection problem.
        event['fingerprint'] = ['database-connection-error']

    elif type_name.startswith("amqp.exceptions."):
        # even though strictly speaking not all amqp exceptions will
        # be 'connection errors' if the problem occurs at that level
        # we generally are not more interested than "something is
        # wrong with [the connection to] rabbitmq"
        event['fingerprint'] = ['rabbit-connection-error']

    elif type_name in ["redis.ConnectionError", "redis.TimeoutError"]:
        # List of connection errors obtained by looking at the redis
        # source code and guessing which ones are
        # connection-related. The current list further includes:
        #
        # AuthenticationError, AuthenticationWrongNumberOfArgsError,
        # BusyLoadingError, ChildDeadlockedError, DataError,
        # ExecAbortError, InvalidResponse, ModuleError,
        # NoPermissionError, NoScriptError, ReadOnlyError,
        # RedisError, ResponseError, some of which may need to be
        # added too.
        event['fingerprint'] = ['redis-connection-error']

    return event


def fingerprint_log_record(event, log_record):
    # (hook for future use)
    return event


def fingerprint_before_send(event, hint):
    if 'exc_info' in hint:
        return fingerprint_exc(event, hint['exc_info'])

    if 'log_record' in hint:
        return fingerprint_log_record(event, hint['log_record'])

    return event


# Apply the custom fingerprinting logic
sentry_sdk.init(
    # ...

    before_send=fingerprint_before_send,
)

Conclusion

By customizing error fingerprints in Sentry’s Python SDK, you can significantly improve the way connection issues are grouped, making your monitoring more efficient.

If you found this guide helpful, consider checking out Bugsink – our drop-in replacement for Sentry, designed to further streamline your error tracking with additional features and enhanced flexibility.

Even if you don’t switch just yet: because Bugsink is a drop-in replacement for Sentry, if you add the above code to your Sentry-SDK configuration, it will help you out with Bugsink as well.