Discussion:
SERVFAIL and peak utilization
Alex
2018-07-26 17:07:22 UTC
Permalink
Hi,

I have a bind-9.11.4 server on a fedora28 system and are frequently
seeing SERVFAIL errors like this:

26-Jul-2018 12:54:04.255 query-errors: info: client @0x7f764314a5c0
127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
(SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
../../../bin/named/query.c:4140

I believe this happens more frequently at times of peak link
utilization, but it also appears to happen during normal times.

This is a local caching server I've set up but it also appears to
exist on other systems that have been set up to be authoritative for
our domain.

How can I troubleshoot this further?

Here is the named.conf for this caching server:

acl "trusted" {
{ 127/8; };
{ 68.195.191.40/29; };
{ 192.168.1.0/24; };
{ 107.155.67.2/32; };
};

options {
listen-on port 53 { 127.0.0.1; 68.195.191.45; };
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named.stats"; // _PATH_STATS
memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS
allow-query { trusted; };
recursion yes;
zone-statistics yes;

// dnssec-enable yes;
// dnssec-validation yes;
// dnssec-lookaside auto;

dnssec-enable no;
dnssec-validation no;
dnssec-lookaside no;

/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";

managed-keys-directory "/var/named/dynamic";

};

logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};

// Record all queries to the box for now
channel query_info {
severity info;
file "/var/log/named.query.log" versions 3 size 10m;
print-time yes;
print-category yes;
};

// added for fail2ban support
channel security_file {
severity dynamic;
file "/var/log/named.security.log" versions 3 size 30m;
print-time yes;
print-category yes;
};

channel b_debug {
file "/var/log/named.debug.log" versions 2 size 10m;
print-time yes;
print-category yes;
print-severity yes;
severity dynamic;
};

// Send the security related messages to a separate file.
channel audit_log {
file "/var/log/named.audit.log" versions 4 size 10m;
severity info;
print-time yes;
print-category yes;
};


category queries { query_info; };
category default { b_debug; };
category config { b_debug; };
category security { security_file; };
// category lame-servers { audit_log; };
category lame-servers { null; };

};

zone "." IN {
type hint;
file "/var/named/named.ca";
};

zone "localhost.localdomain" IN {
type master;
file "named.localhost";
allow-update { none; };
};

zone "localhost" IN {
type master;
file "named.localhost";
allow-update { none; };
};

zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
IN {
type master;
file "named.loopback";
allow-update { none; };
};

zone "1.0.0.127.in-addr.arpa" IN {
type master;
file "named.loopback";
allow-update { none; };
};

zone "0.in-addr.arpa" IN {
type master;
file "named.empty";
allow-update { none; };
};

include "/etc/named.root.key";
include "/etc/rndc.key";
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-***@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
John Miller
2018-07-26 17:57:04 UTC
Permalink
Hi Alex,

What does your query volume look like on this server? Depending on
volume, the BIND defaults for:

- clients-per-query
- max-clients-per-query
- recursive-clients
- tcp-clients

and others may not be set high enough. Check pp. 106-108 in the
latest 9.11 manual for more details on each of these.

Of course, if you're only seeing SERVFAIL for a handful of domains,
then they may have some sort of delegation issue, or there might be a
network issue between your caching servers and them.

John
Post by Alex
Hi,
I have a bind-9.11.4 server on a fedora28 system and are frequently
127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
(SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
../../../bin/named/query.c:4140
I believe this happens more frequently at times of peak link
utilization, but it also appears to happen during normal times.
This is a local caching server I've set up but it also appears to
exist on other systems that have been set up to be authoritative for
our domain.
How can I troubleshoot this further?
acl "trusted" {
{ 127/8; };
{ 68.195.191.40/29; };
{ 192.168.1.0/24; };
{ 107.155.67.2/32; };
};
options {
listen-on port 53 { 127.0.0.1; 68.195.191.45; };
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named.stats"; // _PATH_STATS
memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS
allow-query { trusted; };
recursion yes;
zone-statistics yes;
// dnssec-enable yes;
// dnssec-validation yes;
// dnssec-lookaside auto;
dnssec-enable no;
dnssec-validation no;
dnssec-lookaside no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
// Record all queries to the box for now
channel query_info {
severity info;
file "/var/log/named.query.log" versions 3 size 10m;
print-time yes;
print-category yes;
};
// added for fail2ban support
channel security_file {
severity dynamic;
file "/var/log/named.security.log" versions 3 size 30m;
print-time yes;
print-category yes;
};
channel b_debug {
file "/var/log/named.debug.log" versions 2 size 10m;
print-time yes;
print-category yes;
print-severity yes;
severity dynamic;
};
// Send the security related messages to a separate file.
channel audit_log {
file "/var/log/named.audit.log" versions 4 size 10m;
severity info;
print-time yes;
print-category yes;
};
category queries { query_info; };
category default { b_debug; };
category config { b_debug; };
category security { security_file; };
// category lame-servers { audit_log; };
category lame-servers { null; };
};
zone "." IN {
type hint;
file "/var/named/named.ca";
};
zone "localhost.localdomain" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "localhost" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "1.0.0.127.in-addr.arpa" IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "0.in-addr.arpa" IN {
type master;
file "named.empty";
allow-update { none; };
};
include "/etc/named.root.key";
include "/etc/rndc.key";
_______________________________________________
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-***@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
Alex
2018-07-26 18:51:08 UTC
Permalink
Hi,
Post by John Miller
Hi Alex,
What does your query volume look like on this server? Depending on
- clients-per-query
- max-clients-per-query
- recursive-clients
- tcp-clients
and others may not be set high enough. Check pp. 106-108 in the
latest 9.11 manual for more details on each of these.
Of course, if you're only seeing SERVFAIL for a handful of domains,
then they may have some sort of delegation issue, or there might be a
network issue between your caching servers and them.
I think it's happening more frequently than for just a remote
misconfigured system. Here is my rndc status, but it doesn't appear to
provide all values you've requested.

It's also occurring for queries to trustworthy remote sources:
26-Jul-2018 14:48:22.975 query-errors: debug 1: client @0x7fddb400c570
127.0.0.1#56094 (mail-dm3nam03on0041.outbound.protection.outlook.com):
query failed (SERVFAIL) for
mail-dm3nam03on0041.outbound.protection.outlook.com/IN/A at
../../../bin/named/query.c:8580

# rndc status
version: BIND 9.11.4-RedHat-9.11.4-1.fc28 (Extended Support Version)
<id:2fe4344>
running on bwimail03.guardiandigital.com: Linux x86_64
4.17.7-200.fc28.x86_64 #1 SMP Tue Jul 17 16:28:31 UTC 2018
boot time: Thu, 26 Jul 2018 18:47:52 GMT
last configured: Thu, 26 Jul 2018 18:47:52 GMT
configuration file: /etc/named.conf (/var/named/chroot/etc/named.conf)
CPUs found: 8
worker threads: 8
UDP listeners per interface: 7
number of zones: 103 (97 automatic)
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 63/900/1000
tcp clients: 0/150
server is up and running

I've also now confirmed it's happening at times of regular network
activity. I'm really stuck. I hope someone can help.

Thanks,
Alex
Post by John Miller
John
Post by Alex
Hi,
I have a bind-9.11.4 server on a fedora28 system and are frequently
127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
(SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
../../../bin/named/query.c:4140
I believe this happens more frequently at times of peak link
utilization, but it also appears to happen during normal times.
This is a local caching server I've set up but it also appears to
exist on other systems that have been set up to be authoritative for
our domain.
How can I troubleshoot this further?
acl "trusted" {
{ 127/8; };
{ 68.195.191.40/29; };
{ 192.168.1.0/24; };
{ 107.155.67.2/32; };
};
options {
listen-on port 53 { 127.0.0.1; 68.195.191.45; };
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named.stats"; // _PATH_STATS
memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS
allow-query { trusted; };
recursion yes;
zone-statistics yes;
// dnssec-enable yes;
// dnssec-validation yes;
// dnssec-lookaside auto;
dnssec-enable no;
dnssec-validation no;
dnssec-lookaside no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
// Record all queries to the box for now
channel query_info {
severity info;
file "/var/log/named.query.log" versions 3 size 10m;
print-time yes;
print-category yes;
};
// added for fail2ban support
channel security_file {
severity dynamic;
file "/var/log/named.security.log" versions 3 size 30m;
print-time yes;
print-category yes;
};
channel b_debug {
file "/var/log/named.debug.log" versions 2 size 10m;
print-time yes;
print-category yes;
print-severity yes;
severity dynamic;
};
// Send the security related messages to a separate file.
channel audit_log {
file "/var/log/named.audit.log" versions 4 size 10m;
severity info;
print-time yes;
print-category yes;
};
category queries { query_info; };
category default { b_debug; };
category config { b_debug; };
category security { security_file; };
// category lame-servers { audit_log; };
category lame-servers { null; };
};
zone "." IN {
type hint;
file "/var/named/named.ca";
};
zone "localhost.localdomain" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "localhost" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "1.0.0.127.in-addr.arpa" IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "0.in-addr.arpa" IN {
type master;
file "named.empty";
allow-update { none; };
};
include "/etc/named.root.key";
include "/etc/rndc.key";
_______________________________________________
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
bind-users mailing list
https://lists.isc.org/mailman/listinfo/bind-users
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-***@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
Alex
2018-07-26 19:54:01 UTC
Permalink
Hi,

I've made some performance adjustments although I really don't know
whether it's correct, and it doesn't seem to have solved the problem.
I also notice the SERVFAIL error seems to happen in bulk - it will
happen for a while and then stop. It definitely seems to occur more
during peak mail volume (this is a mail server).

max-clients-per-query 4000;
clients-per-query 4000;
recursive-clients 4000;
tcp-clients 4000;

Here's the named_stats.txt file from "rndc stats":

+++ Statistics Dump +++ (1532630822)
++ Incoming Requests ++
3267 QUERY
++ Incoming Queries ++
2345 A
74 NS
69 PTR
152 MX
569 TXT
58 AAAA
++ Outgoing Rcodes ++
1356 NOERROR
648 SERVFAIL
1070 NXDOMAIN
++ Outgoing Queries ++
[View: default]
8749 A
139 NS
133 PTR
30 MX
640 TXT
6 AAAA
488 DS
87 DNSKEY
[View: _bind]
++ Name Server Statistics ++
3267 IPv4 requests received
2026 requests with EDNS(0) received
6 TCP requests received
3074 responses sent
6 truncated responses sent
1883 responses with EDNS(0) sent
1134 queries resulted in successful answer
2426 queries resulted in non authoritative answer
222 queries resulted in nxrrset
648 queries resulted in SERVFAIL
1070 queries resulted in NXDOMAIN
2190 queries caused recursion
33 duplicate queries received
4 queries dropped
156 recursing clients
3249 UDP queries received
6 TCP queries received
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
[Common]
143 UDP queries in progress
[View: default]
10272 IPv4 queries sent
2503 IPv4 responses received
611 NXDOMAIN received
1 SERVFAIL received
16 FORMERR received
14 EDNS(0) query failures
448 truncated responses received
7865 query retries
7674 query timeouts
380 IPv4 NS address fetches
33 IPv4 NS address fetch failed
1129 DNSSEC validation attempted
348 DNSSEC validation succeeded
741 DNSSEC NX validation succeeded
1 DNSSEC validation failed
78 queries with RTT < 10ms
1394 queries with RTT 10-100ms
981 queries with RTT 100-500ms
6 queries with RTT 500-800ms
1 queries with RTT 800-1600ms
150 active fetches
523 bucket size
3 REFUSED received
6146 COOKIE send with client cookie only
393 COOKIE sent with client and server cookie
291 COOKIE replies received
291 COOKIE client ok
[View: _bind]
523 bucket size
++ Cache Statistics ++
[View: default]
22101 cache hits
13 cache misses
5896 cache hits (from query)
3416 cache misses (from query)
0 cache records deleted due to memory exhaustion
0 cache records deleted due to TTL expiration
2096 cache database nodes
1039 cache database hash buckets
1352276 cache tree memory total
1022492 cache tree memory in use
1022548 cache tree highest memory in use
393216 cache heap memory total
132096 cache heap memory in use
132096 cache heap highest memory in use
[View: _bind (Cache: _bind)]
0 cache hits
0 cache misses
0 cache hits (from query)
0 cache misses (from query)
0 cache records deleted due to memory exhaustion
0 cache records deleted due to TTL expiration
0 cache database nodes
64 cache database hash buckets
287792 cache tree memory total
29952 cache tree memory in use
29952 cache tree highest memory in use
262144 cache heap memory total
1024 cache heap memory in use
1024 cache heap highest memory in use
++ Cache DB RRsets ++
[View: default]
963 A
299 NS
14 CNAME
23 PTR
19 MX
47 TXT
400 AAAA
57 DS
193 RRSIG
33 NSEC
34 DNSKEY
3 !A
2 !NS
1 !MX
19 !TXT
1 !AAAA
122 !DS
557 NXDOMAIN
1 #RRSIG
1 #NSEC
[View: _bind (Cache: _bind)]
++ ADB stats ++
[View: default]
1021 Address hash table size
916 Addresses in hash table
1021 Name hash table size
1035 Names in hash table
[View: _bind]
1021 Address hash table size
1021 Name hash table size
++ Socket I/O Statistics ++
9861 UDP/IPv4 sockets opened
450 TCP/IPv4 sockets opened
1 Raw sockets opened
9711 UDP/IPv4 sockets closed
454 TCP/IPv4 sockets closed
30 UDP/IPv4 socket bind failures
9824 UDP/IPv4 connections established
446 TCP/IPv4 connections established
7 TCP/IPv4 connections accepted
43 UDP/IPv4 recv errors
150 UDP/IPv4 sockets active
3 TCP/IPv4 sockets active
1 Raw sockets active
++ Per Zone Query Statistics ++
--- Statistics Dump --- (1532630822)
+++ Statistics Dump +++ (1532634389)
++ Incoming Requests ++
26879 QUERY
++ Incoming Queries ++
18386 A
642 NS
351 PTR
1186 MX
5626 TXT
688 AAAA
++ Outgoing Rcodes ++
12312 NOERROR
3066 SERVFAIL
11270 NXDOMAIN
++ Outgoing Queries ++
[View: default]
57901 A
1761 NS
566 PTR
555 MX
4177 TXT
87 AAAA
2 DNSKEY
[View: _bind]
++ Name Server Statistics ++
26879 IPv4 requests received
16404 requests with EDNS(0) received
168 TCP requests received
26648 responses sent
168 truncated responses sent
16357 responses with EDNS(0) sent
10556 queries resulted in successful answer
23582 queries resulted in non authoritative answer
1756 queries resulted in nxrrset
3066 queries resulted in SERVFAIL
11270 queries resulted in NXDOMAIN
14505 queries caused recursion
231 duplicate queries received
26693 UDP queries received
168 TCP queries received
2 COOKIE option received
2 COOKIE - client only
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
[Common]
[View: default]
65049 IPv4 queries sent
12813 IPv4 responses received
7832 NXDOMAIN received
5 SERVFAIL received
32 FORMERR received
26 EDNS(0) query failures
530 truncated responses received
4 lame delegations received
50747 query retries
52327 query timeouts
1038 IPv4 NS address fetches
205 IPv4 NS address fetch failed
706 queries with RTT < 10ms
7423 queries with RTT 10-100ms
4076 queries with RTT 100-500ms
342 queries with RTT 500-800ms
39 queries with RTT 800-1600ms
9 queries with RTT > 1600ms
523 bucket size
6 REFUSED received
20513 COOKIE send with client cookie only
1485 COOKIE sent with client and server cookie
921 COOKIE replies received
921 COOKIE client ok
[View: _bind]
523 bucket size
++ Cache Statistics ++
[View: default]
158038 cache hits
13 cache misses
62750 cache hits (from query)
19356 cache misses (from query)
0 cache records deleted due to memory exhaustion
126 cache records deleted due to TTL expiration
12112 cache database nodes
4159 cache database hash buckets
4822015 cache tree memory total
4393804 cache tree memory in use
4394140 cache tree highest memory in use
393216 cache heap memory total
132096 cache heap memory in use
132096 cache heap highest memory in use
[View: _bind (Cache: _bind)]
0 cache hits
0 cache misses
0 cache hits (from query)
0 cache misses (from query)
0 cache records deleted due to memory exhaustion
0 cache records deleted due to TTL expiration
0 cache database nodes
64 cache database hash buckets
293568 cache tree memory total
29952 cache tree memory in use
35728 cache tree highest memory in use
262144 cache heap memory total
1024 cache heap memory in use
1024 cache heap highest memory in use
++ Cache DB RRsets ++
[View: default]
3060 A
863 NS
302 CNAME
81 PTR
77 MX
186 TXT
1152 AAAA
85 DS
259 RRSIG
80 NSEC
1 DNSKEY
28 !A
27 !NS
2 !MX
94 !TXT
5 !AAAA
6192 NXDOMAIN
[View: _bind (Cache: _bind)]
++ ADB stats ++
[View: default]
1021 Address hash table size
2125 Addresses in hash table
1021 Name hash table size
1427 Names in hash table
[View: _bind]
1021 Address hash table size
1021 Name hash table size
++ Socket I/O Statistics ++
64830 UDP/IPv4 sockets opened
532 TCP/IPv4 sockets opened
1 Raw sockets opened
64823 UDP/IPv4 sockets closed
726 TCP/IPv4 sockets closed
304 UDP/IPv4 socket bind failures
64519 UDP/IPv4 connections established
519 TCP/IPv4 connections established
197 TCP/IPv4 connections accepted
218 UDP/IPv4 recv errors
7 UDP/IPv4 sockets active
3 TCP/IPv4 sockets active
1 Raw sockets active
++ Per Zone Query Statistics ++
--- Statistics Dump --- (1532634389)
Post by Alex
Hi,
Post by John Miller
Hi Alex,
What does your query volume look like on this server? Depending on
- clients-per-query
- max-clients-per-query
- recursive-clients
- tcp-clients
and others may not be set high enough. Check pp. 106-108 in the
latest 9.11 manual for more details on each of these.
Of course, if you're only seeing SERVFAIL for a handful of domains,
then they may have some sort of delegation issue, or there might be a
network issue between your caching servers and them.
I think it's happening more frequently than for just a remote
misconfigured system. Here is my rndc status, but it doesn't appear to
provide all values you've requested.
query failed (SERVFAIL) for
mail-dm3nam03on0041.outbound.protection.outlook.com/IN/A at
../../../bin/named/query.c:8580
# rndc status
version: BIND 9.11.4-RedHat-9.11.4-1.fc28 (Extended Support Version)
<id:2fe4344>
running on bwimail03.guardiandigital.com: Linux x86_64
4.17.7-200.fc28.x86_64 #1 SMP Tue Jul 17 16:28:31 UTC 2018
boot time: Thu, 26 Jul 2018 18:47:52 GMT
last configured: Thu, 26 Jul 2018 18:47:52 GMT
configuration file: /etc/named.conf (/var/named/chroot/etc/named.conf)
CPUs found: 8
worker threads: 8
UDP listeners per interface: 7
number of zones: 103 (97 automatic)
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 63/900/1000
tcp clients: 0/150
server is up and running
I've also now confirmed it's happening at times of regular network
activity. I'm really stuck. I hope someone can help.
Thanks,
Alex
Post by John Miller
John
Post by Alex
Hi,
I have a bind-9.11.4 server on a fedora28 system and are frequently
127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
(SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
../../../bin/named/query.c:4140
I believe this happens more frequently at times of peak link
utilization, but it also appears to happen during normal times.
This is a local caching server I've set up but it also appears to
exist on other systems that have been set up to be authoritative for
our domain.
How can I troubleshoot this further?
acl "trusted" {
{ 127/8; };
{ 68.195.191.40/29; };
{ 192.168.1.0/24; };
{ 107.155.67.2/32; };
};
options {
listen-on port 53 { 127.0.0.1; 68.195.191.45; };
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named.stats"; // _PATH_STATS
memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS
allow-query { trusted; };
recursion yes;
zone-statistics yes;
// dnssec-enable yes;
// dnssec-validation yes;
// dnssec-lookaside auto;
dnssec-enable no;
dnssec-validation no;
dnssec-lookaside no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
// Record all queries to the box for now
channel query_info {
severity info;
file "/var/log/named.query.log" versions 3 size 10m;
print-time yes;
print-category yes;
};
// added for fail2ban support
channel security_file {
severity dynamic;
file "/var/log/named.security.log" versions 3 size 30m;
print-time yes;
print-category yes;
};
channel b_debug {
file "/var/log/named.debug.log" versions 2 size 10m;
print-time yes;
print-category yes;
print-severity yes;
severity dynamic;
};
// Send the security related messages to a separate file.
channel audit_log {
file "/var/log/named.audit.log" versions 4 size 10m;
severity info;
print-time yes;
print-category yes;
};
category queries { query_info; };
category default { b_debug; };
category config { b_debug; };
category security { security_file; };
// category lame-servers { audit_log; };
category lame-servers { null; };
};
zone "." IN {
type hint;
file "/var/named/named.ca";
};
zone "localhost.localdomain" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "localhost" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "1.0.0.127.in-addr.arpa" IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "0.in-addr.arpa" IN {
type master;
file "named.empty";
allow-update { none; };
};
include "/etc/named.root.key";
include "/etc/rndc.key";
_______________________________________________
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
bind-users mailing list
https://lists.isc.org/mailman/listinfo/bind-users
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-***@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
Alex
2018-07-26 21:49:09 UTC
Permalink
Hi, here is some further debugging on what I believe are queries
involving SERVFAIL:

26-Jul-2018 17:44:40.168 query-errors: debug 1: client @0x7fbee80f39b0
127.0.0.1#61547 (69.248.70.96.bad.psky.me): query failed (SERVFAIL)
for 69.248.70.96.bad.psky.me/IN/A at ../../../bin/named/query.c:8580
26-Jul-2018 17:44:40.168 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 69.248.70.96.bad.psky.me/A in
10.000096: timed out/success
[domain:psky.me,referral:1,restart:2,qrysent:4,timeout:3,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
26-Jul-2018 17:44:40.172 query-errors: debug 1: client @0x7fbed81218a0
127.0.0.1#61547 (176.216.85.209.psbl.surriel.com): query failed
(SERVFAIL) for 176.216.85.209.psbl.surriel.com/IN/A at
../../../bin/named/query.c:8580
26-Jul-2018 17:44:40.172 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 176.216.85.209.psbl.surriel.com/A
in 10.000128: timed out/success
[domain:psbl.surriel.com,referral:2,restart:1,qrysent:2,timeout:1,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
26-Jul-2018 17:44:40.173 query-errors: debug 1: client @0x7fbedc134ed0
127.0.0.1#61547 (176.216.85.209.dnsbl-3.uceprotect.net): query failed
(SERVFAIL) for 176.216.85.209.dnsbl-3.uceprotect.net/IN/A at
../../../bin/named/query.c:8580
26-Jul-2018 17:44:40.173 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for
176.216.85.209.dnsbl-3.uceprotect.net/A in 10.000097: timed
out/success [domain:dnsbl-3.uceprotect.net,referral:2,restart:1,qrysent:2,timeout:1,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

There appears to be a few timeout errors. Is this an indication there
is a performance problem with the cable modem or connection?

Thanks,
Alex
Post by John Miller
Hi Alex,
What does your query volume look like on this server? Depending on
- clients-per-query
- max-clients-per-query
- recursive-clients
- tcp-clients
and others may not be set high enough. Check pp. 106-108 in the
latest 9.11 manual for more details on each of these.
Of course, if you're only seeing SERVFAIL for a handful of domains,
then they may have some sort of delegation issue, or there might be a
network issue between your caching servers and them.
John
Post by Alex
Hi,
I have a bind-9.11.4 server on a fedora28 system and are frequently
127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
(SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
../../../bin/named/query.c:4140
I believe this happens more frequently at times of peak link
utilization, but it also appears to happen during normal times.
This is a local caching server I've set up but it also appears to
exist on other systems that have been set up to be authoritative for
our domain.
How can I troubleshoot this further?
acl "trusted" {
{ 127/8; };
{ 68.195.191.40/29; };
{ 192.168.1.0/24; };
{ 107.155.67.2/32; };
};
options {
listen-on port 53 { 127.0.0.1; 68.195.191.45; };
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named.stats"; // _PATH_STATS
memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS
allow-query { trusted; };
recursion yes;
zone-statistics yes;
// dnssec-enable yes;
// dnssec-validation yes;
// dnssec-lookaside auto;
dnssec-enable no;
dnssec-validation no;
dnssec-lookaside no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
// Record all queries to the box for now
channel query_info {
severity info;
file "/var/log/named.query.log" versions 3 size 10m;
print-time yes;
print-category yes;
};
// added for fail2ban support
channel security_file {
severity dynamic;
file "/var/log/named.security.log" versions 3 size 30m;
print-time yes;
print-category yes;
};
channel b_debug {
file "/var/log/named.debug.log" versions 2 size 10m;
print-time yes;
print-category yes;
print-severity yes;
severity dynamic;
};
// Send the security related messages to a separate file.
channel audit_log {
file "/var/log/named.audit.log" versions 4 size 10m;
severity info;
print-time yes;
print-category yes;
};
category queries { query_info; };
category default { b_debug; };
category config { b_debug; };
category security { security_file; };
// category lame-servers { audit_log; };
category lame-servers { null; };
};
zone "." IN {
type hint;
file "/var/named/named.ca";
};
zone "localhost.localdomain" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "localhost" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "1.0.0.127.in-addr.arpa" IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "0.in-addr.arpa" IN {
type master;
file "named.empty";
allow-update { none; };
};
include "/etc/named.root.key";
include "/etc/rndc.key";
_______________________________________________
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
bind-users mailing list
https://lists.isc.org/mailman/listinfo/bind-users
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-***@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
Alex
2018-07-27 20:46:38 UTC
Permalink
Hi, I'm still having a problem and haven't received any replies. Is
there anyone with any ideas on how to troubleshoot this?

What other information can I provide to help troubleshoot this?
Post by Alex
Hi, here is some further debugging on what I believe are queries
127.0.0.1#61547 (69.248.70.96.bad.psky.me): query failed (SERVFAIL)
for 69.248.70.96.bad.psky.me/IN/A at ../../../bin/named/query.c:8580
26-Jul-2018 17:44:40.168 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 69.248.70.96.bad.psky.me/A in
10.000096: timed out/success
[domain:psky.me,referral:1,restart:2,qrysent:4,timeout:3,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
127.0.0.1#61547 (176.216.85.209.psbl.surriel.com): query failed
(SERVFAIL) for 176.216.85.209.psbl.surriel.com/IN/A at
../../../bin/named/query.c:8580
26-Jul-2018 17:44:40.172 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 176.216.85.209.psbl.surriel.com/A
in 10.000128: timed out/success
[domain:psbl.surriel.com,referral:2,restart:1,qrysent:2,timeout:1,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
127.0.0.1#61547 (176.216.85.209.dnsbl-3.uceprotect.net): query failed
(SERVFAIL) for 176.216.85.209.dnsbl-3.uceprotect.net/IN/A at
../../../bin/named/query.c:8580
26-Jul-2018 17:44:40.173 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for
176.216.85.209.dnsbl-3.uceprotect.net/A in 10.000097: timed
out/success [domain:dnsbl-3.uceprotect.net,referral:2,restart:1,qrysent:2,timeout:1,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
There appears to be a few timeout errors. Is this an indication there
is a performance problem with the cable modem or connection?
Thanks,
Alex
Post by John Miller
Hi Alex,
What does your query volume look like on this server? Depending on
- clients-per-query
- max-clients-per-query
- recursive-clients
- tcp-clients
and others may not be set high enough. Check pp. 106-108 in the
latest 9.11 manual for more details on each of these.
Of course, if you're only seeing SERVFAIL for a handful of domains,
then they may have some sort of delegation issue, or there might be a
network issue between your caching servers and them.
John
Post by Alex
Hi,
I have a bind-9.11.4 server on a fedora28 system and are frequently
127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
(SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
../../../bin/named/query.c:4140
I believe this happens more frequently at times of peak link
utilization, but it also appears to happen during normal times.
This is a local caching server I've set up but it also appears to
exist on other systems that have been set up to be authoritative for
our domain.
How can I troubleshoot this further?
acl "trusted" {
{ 127/8; };
{ 68.195.191.40/29; };
{ 192.168.1.0/24; };
{ 107.155.67.2/32; };
};
options {
listen-on port 53 { 127.0.0.1; 68.195.191.45; };
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named.stats"; // _PATH_STATS
memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS
allow-query { trusted; };
recursion yes;
zone-statistics yes;
// dnssec-enable yes;
// dnssec-validation yes;
// dnssec-lookaside auto;
dnssec-enable no;
dnssec-validation no;
dnssec-lookaside no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
// Record all queries to the box for now
channel query_info {
severity info;
file "/var/log/named.query.log" versions 3 size 10m;
print-time yes;
print-category yes;
};
// added for fail2ban support
channel security_file {
severity dynamic;
file "/var/log/named.security.log" versions 3 size 30m;
print-time yes;
print-category yes;
};
channel b_debug {
file "/var/log/named.debug.log" versions 2 size 10m;
print-time yes;
print-category yes;
print-severity yes;
severity dynamic;
};
// Send the security related messages to a separate file.
channel audit_log {
file "/var/log/named.audit.log" versions 4 size 10m;
severity info;
print-time yes;
print-category yes;
};
category queries { query_info; };
category default { b_debug; };
category config { b_debug; };
category security { security_file; };
// category lame-servers { audit_log; };
category lame-servers { null; };
};
zone "." IN {
type hint;
file "/var/named/named.ca";
};
zone "localhost.localdomain" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "localhost" IN {
type master;
file "named.localhost";
allow-update { none; };
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "1.0.0.127.in-addr.arpa" IN {
type master;
file "named.loopback";
allow-update { none; };
};
zone "0.in-addr.arpa" IN {
type master;
file "named.empty";
allow-update { none; };
};
include "/etc/named.root.key";
include "/etc/rndc.key";
_______________________________________________
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
bind-users mailing list
https://lists.isc.org/mailman/listinfo/bind-users
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-***@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Loading...