Skip to content
Snippets Groups Projects

Improve handling of ConnectionError

Merged Nicki Křížek requested to merge nicki/improve-sendrecv-connection-error-handling into master

In a ConnectionError happens, try and capture the information for which resolver it occurred. While it may be just a regular one-off network blip, it may also indicate a failure mode in one of the target resolvers. That is especially true if the same connection error happens repeatedly for a single resolver.

Prior to this change, the information about the resolver was lacking. This made it hard to assess whether the connection errors were caused by some network issue, or if it's likely that one of the target resolver is at fault.

Merge request reports

Pipeline #135109 failed

Pipeline failed for 56c4e9a7 on nicki/improve-sendrecv-connection-error-handling

Approval is optional

Merged by Nicki KřížekNicki Křížek 2 months ago (Feb 7, 2025 11:39am UTC)

Merge details

  • Changes merged into master with 0f8abd62.
  • Deleted the source branch.

Pipeline #135165 failed

Pipeline failed for 0f8abd62 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Nicki Křížek added 2 commits

    added 2 commits

    Compare with previous version

  • Nicki Křížek added 2 commits

    added 2 commits

    Compare with previous version

  • So the main addition here d0ed86ee which implements the new functionality and also has a bit of refactoring.

    I've deployed this in our testing environment and respdiff works just fine, so I believe it can be safely merged.

    One thing to note is that is a server actually crashes, then many more than just 3 messages may get generated, since the sendrecv is typically used by many jobs in parallel. However, since it's a fairly rare occasion and not regular operation, I think the extra data is worth a bit of spam in the log. If someone has an issue with that, the log level could be raised from DEBUG to INFO to hide those messages.

    FTR, a crash may look like this. Previously, it was impossible to tell which server crashed, not the server's name is included in the log:

    ...
    2025-02-05 15:26:38,254    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,255    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,255    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,256    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,242    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,257    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,257    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,257    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,258    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,259    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,259    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,259    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,261    DEBUG  [testbind] [Errno 111] Connection refused
    2025-02-05 15:26:38,481    ERROR  ConnectionError received 3 times in a row (testbind, testbind, testbind), exiting!

    I'll leave this open for a day or so and merge it then unless anyone objects.

  • mentioned in commit 0f8abd62

Please register or sign in to reply
Loading