Skip to content
Snippets Groups Projects

daemon: lower EDNS buffer size to 1232

Merged Lukas Jezek requested to merge 538-lower-default-edns-buffer-size-to-1232 into master
All threads resolved!
Edited by Tomas Krizek

Merge request reports

Pipeline #70960 failed

Pipeline failed for eadc0515 on 538-lower-default-edns-buffer-size-to-1232

Approved by

Merged by Tomas KrizekTomas Krizek 4 years ago (Oct 23, 2020 8:59am UTC)

Merge details

Pipeline #71055 failed

Pipeline failed for 43b40577 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I just peeked at the test results from respdiff. A lof of fwd tests timed out, but one that did finish showed huge increase in target_disagreements. (This also explains the other jobs timing out, since diffrepro takes quite a while)

    ERROR    target_disagreements: threshold exceeded! sample: 8694 / 100.00% vs threshold: 724 / 98.00%

    I wouldn't release this in 5.0. Let's wait until the other vendors are close to releasing version with updated default EDNS buffer size.

  • Tomas Krizek removed milestone

    removed milestone

  • Petr Špaček added 163 commits

    added 163 commits

    Compare with previous version

  • assigned to @ljezek

  • I was thinking about this for a while and I think we have to split EDNS buffer size for clients and for upstream. This split will enable better comparison in respdiff, right now it is full of noise because of TC bit etc.

    So the next step is to separate current net.bufsize() to two values:

    • net.bufsize_upstream() for communication with other servers
    • net.bufsize_downstream() for communication with clients

    Maybe we will rename this but I do not have a better idea at the moment.

    TODO:

    • split net.bufsize()
    • retest in respdiff with downstream value 4096 and upstream 1232
    • think about backwards compatibility - should we remove net.bufsize() or keep it so it can set both values at once?
    • if we keep it, what it should return if the values for upstream and downstream differ?
    • document new functions
    Edited by Tomas Krizek
  • @tkrizek Any opinion on backwards compatibility in Lua config?

  • Letting it set both values sounds good to me, should be easy enough and good for compatibility. The get doesn't seem so important; EDIT: it's also possible to return two values (so the second one is simply ignored on normal assignment).

    An alternative approach to two functions is to let net.bufsize() accept one or two parameters (positional); EDIT: that would sound consistent with the two-valued return, too.

    Edited by Vladimír Čunát
  • Petr Špaček added usability label and removed easyfix label

    added usability label and removed easyfix label

  • net.bufsize() that sets both values sounds good. It should return two values, in all cases. The first value should be the one that is more relevant to the user (probably the downstream one?), so it doesn't impact backward compatibility.

    net.bufsize(both) and net.bufsize(downstream, upstream) seem a bit better than two additional functions.

  • Lukas Jezek added 242 commits

    added 242 commits

    • 5751b037...6552d471 - 239 commits from branch master
    • 756dc36d - daemon: lowered EDNS buffer size to 1232
    • bd95122d - docs: EDNS bufsize = 1232
    • 8d2b53a5 - net: Split of the EDNS buffer size into upstream and downstream

    Compare with previous version

  • Lukas Jezek resolved all threads

    resolved all threads

  • Lukas Jezek added 1 commit

    added 1 commit

    • 935fa915 - WIP: test new upstream bufsize in CI

    Compare with previous version

  • Petr Špaček added 30 commits

    added 30 commits

    • 935fa915...87253ef7 - 26 commits from branch master
    • a64e79e3 - daemon: lowered EDNS buffer size to 1232
    • 9c441080 - docs: EDNS bufsize = 1232
    • b45c516e - net: Split of the EDNS buffer size into upstream and downstream
    • 3dc76c19 - WIP: test new upstream bufsize in CI

    Compare with previous version

  • TODO: Split into two MRs - one with new net.bufsize() command and second which changes defaults.

  • Lukas Jezek mentioned in merge request !1026 (merged)

    mentioned in merge request !1026 (merged)

  • Author Contributor

    Split net.bufsize() moved into !1026 (merged)

  • Petr Špaček changed title from WIP: daemon: lowered EDNS buffer size to 1232 to WIP: daemon: lower EDNS buffer size to 1232

    changed title from WIP: daemon: lowered EDNS buffer size to 1232 to WIP: daemon: lower EDNS buffer size to 1232

  • Petr Špaček changed milestone to %5.2.0

    changed milestone to %5.2.0

  • Vladimír Čunát added 156 commits

    added 156 commits

    Compare with previous version

  • added 1 commit

    • 84e642e9 - ci/respdiff: keep bufsize towards respdiff at 4K

    Compare with previous version

  • Respdiff waiting for respdiff!90 (merged)

    Otherwise I think it's ready, though it will be nice to do some larger respdiff runs after the above is merged (merge needed to avoid this MR skewing comparisons).

  • added 8 commits

    Compare with previous version

  • mentioned in issue #300 (closed)

  • Tomas Krizek added 42 commits

    added 42 commits

    Compare with previous version

    • Resolved by Tomas Krizek

      Currently, respdiff sets the downstream bufsize to 4096 and keeps the upstream bufsize as-is - which means 1232 for this MR, but 4096 for our reference sample running on master.

      The current state is that fwd-unbound jobs fails to finish in time on this MR. When I ran it manually, I got the following results.

      == Global statistics
      duration                    4842           seconds                                        
      queries                   616341                                                          
      answers                   616341  100.00 % of queries                                     
      
      == Differences statistics
      manually ignored               0    0.00 % of answers (ignoring)                          
      upstream unstable           6278    1.02 % of answers (ignoring)                          
      not 100% reproducible          0    0.00 % of answers (ignoring)                          
      target disagrees           35678    5.85 % of not ignored answers                         
      
      == Target Disagreements
      Field           Count    % of mismatches
      ------------  -------  -----------------
      rcode           35643              99.90
      flags              15               0.04
      timeout            12               0.03
      answertypes         6               0.02
      answerrrsigs        2               0.01
      
      == Field "timeout" mismatch statistics
      Expected    Got        Count    % of mimatches
      ----------  -------  -------  ----------------
      answer      timeout       12              0.03
      
      == Field "rcode" mismatch statistics
      Expected    Got         Count    % of mimatches
      ----------  --------  -------  ----------------
      NOERROR     SERVFAIL    31317             87.78
      NXDOMAIN    SERVFAIL     4322             12.11
      SERVFAIL    NOERROR         2              0.01
      NOERROR     NXDOMAIN        1              0.00
      NXDOMAIN    NOERROR         1              0.00

      j._report.txt

      j._report.json

      It's interesting that kresd takes a lot of time before answering SERVFAIL now: SERVFAIL

      Compared to the current situation on master: SERVFAIL

  • Tomas Krizek added 96 commits

    added 96 commits

    Compare with previous version

  • Tomas Krizek resolved all threads

    resolved all threads

  • Tomas Krizek unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Tomas Krizek approved this merge request

    approved this merge request

  • Tomas Krizek enabled an automatic merge when the pipeline for 6857b7f0 succeeds

    enabled an automatic merge when the pipeline for 6857b7f0 succeeds

  • Tomas Krizek canceled the automatic merge

    canceled the automatic merge

  • Tomas Krizek unapproved this merge request

    unapproved this merge request

    • Resolved by Tomas Krizek

      There's still some issue with iter.tls6 test. The number of SERVFAILs is sligthly increased by about 100-150 (to about 2k differences total). It is reproducible, against current master, when tests are executed at the same time, even with decreased number of jobs. For reference, this is my attempt with just 128 parallel jobs (half of the throughput currently used in CI).

      6857b7f0-edns1232-tls128_vs_5b474f2-ref-tls128_shortlist.iter.tls6 6857b7f0-edns1232-tls128_vs_5b474f2-ref-tls128_shortlist.iter.tls6.diffrepro

  • Tomas Krizek added 7 commits

    added 7 commits

    Compare with previous version

  • Tomas Krizek added 1 commit

    added 1 commit

    • bc2fbf00 - DROP: turn off gnutls logging

    Compare with previous version

  • Tomas Krizek marked as a Work In Progress

    marked as a Work In Progress

  • Tomas Krizek assigned to @tkrizek and unassigned @ljezek

    assigned to @tkrizek and unassigned @ljezek

  • Tomas Krizek added 14 commits

    added 14 commits

    Compare with previous version

  • Tomas Krizek resolved all threads

    resolved all threads

  • Tomas Krizek unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Tomas Krizek approved this merge request

    approved this merge request

  • unassigned @tkrizek

  • Vladimír Čunát approved this merge request

    approved this merge request

  • Tomas Krizek mentioned in commit 43b40577

    mentioned in commit 43b40577

  • merged

  • Please register or sign in to reply
    Loading