Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • Knot Resolver Knot Resolver
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 135
    • Issues 135
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 12
    • Merge requests 12
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Knot projects
  • Knot ResolverKnot Resolver
  • Issues
  • #493

Closed
Open
Created Jul 30, 2019 by ValdikSS@ValdikSS

Resolver stops working and returns SERVFAIL until restarted

Some time after normal operation, knot-resolver stops resolving any domains and returns SERVFAIL on all DNS queries. I have the following configuration:

# cat /etc/knot-resolver/kresd.conf
user('knot-resolver','knot-resolver')
cache.size = 300 * MB

net.ipv6 = false

modules = {
        'hints > iterate',  -- Load /etc/hosts and allow custom root hints
        'stats',            -- Track internal statistics
        'predict',          -- Prefetch expiring/frequent records
}

-- minimum TTL = 2 minutes
cache.min_ttl(120)

dofile("/etc/knot-resolver/knot-aliases-alt.conf")

policy.add(
    policy.suffix(
        policy.STUB(
            {'127.0.0.4'}
        ),
        policy.todnames(blocked_hosts)
    )
)


# cat /etc/knot-resolver/knot-aliases-alt.conf
blocked_hosts = {
"0000a-fast-proxy.de.",
"002cc20.icu.",
"007ingyenletoltes.hu.",
"007rc.biz.",
"007slots.com.",
"00seeds.com.",
"010119azino777.com.",
"010119azino777.ru.",
…
"zzzes.ru.",
"zzztorrent.net.",
"zzzz1.live.",
"zzzz2.live.",
}

Both normal recursive queries and queries which should be forwarded to 127.0.0.4 (from blocked_hosts) fail to work.

I've just enabled verbose logging to monitor the issue, but the log seems to buffer a lot. I see new information in journald's journalctl in spikes, a large log every 30 seconds or so. I'm not sure if this is some sort of cache and is to be expected, or it shows some kind of lock problem. It even triggered a watchdog once:

systemd[1]: kresd@1.service: Watchdog timeout (limit 10s)!
systemd[1]: kresd@1.service: Killing process 23036 (kresd) with signal SIGABRT.
systemd[1]: kresd@1.service: Main process exited, code=killed, status=6/ABRT
systemd[1]: kresd@1.service: Unit entered failed state.
systemd[1]: kresd@1.service: Failed with result 'watchdog'.
systemd[1]: kresd@1.service: Service hold-off time over, scheduling restart.

The issue happens irregularly. It used to works fine for weeks but in the last 3 days it happened for 3 times. Sometimes it takes dozens of hours, some time only several minutes. I did not update the configuration and updated the software only after second time. It happens on 4.1.0.

Right now I'm running verbose logging and will update this issue when it happens again.

Edited Jan 09, 2021 by ValdikSS
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking