Discussion:
BIND 8.2.1 creating lots of CPU usage for syslogd and named
Marcel Janssen
1999-08-09 21:00:26 UTC
Permalink
Hi,

I'm running BIND 8.2.1 on Solaris 2.6

When I start named SYSLOGD and NAMED are using
lots of CPU.

This is some output from top
167 root 10 33 0 4136K 1984K sleep 21.1H 45.70% syslogd
6721 root 1 7 0 4312K 1888K sleep 0:18 16.45% named

When I turn on querylogging I get this about 200 times a second
08-Aug-1999 13:57:42.559 XX+/134.188.149.78/78.149.188.134.in-addr.arpa/PTR
08-Aug-1999 13:57:42.561 XX+/134.188.149.78/ts1-st16.oce.nl/A

In which 134.188.149.78 is the ip adres and ts1-st16.oce.nl is the hostname
running NAMED. Thus it performs about 200 queries a second for its own
ip/hostname.

Can anybody help me?

Thanks,
Marcel
Michael van Elst
1999-08-09 22:19:20 UTC
Permalink
Post by Marcel Janssen
Hi,
I'm running BIND 8.2.1 on Solaris 2.6
When I start named SYSLOGD and NAMED are using
lots of CPU.
This is some output from top
167 root 10 33 0 4136K 1984K sleep 21.1H 45.70% syslogd
6721 root 1 7 0 4312K 1888K sleep 0:18 16.45% named
When I turn on querylogging I get this about 200 times a second
08-Aug-1999 13:57:42.559 XX+/134.188.149.78/78.149.188.134.in-addr.arpa/PTR
08-Aug-1999 13:57:42.561 XX+/134.188.149.78/ts1-st16.oce.nl/A
In which 134.188.149.78 is the ip adres and ts1-st16.oce.nl is the hostname
running NAMED. Thus it performs about 200 queries a second for its own
ip/hostname.
Can anybody help me?
Apparently the machine doesn't know its own name without querying
the nameserver. This often means that you haven't entered the
name into /etc/hosts, but more likely, you use a search order
of 'hosts: dns, files' or only 'hosts: dns' in /etc/nsswitch.conf.

The result ist that syslog tries to resolve the origin of each
log message and thus creates a new query and a new log message
in an endless recursion.
--
Michael van Elst

Internet: mlelstv at serpens.swb.de
"A potential Snark may lurk in every tree."
Marcel Janssen
1999-08-10 09:13:49 UTC
Permalink
Post by Michael van Elst
Apparently the machine doesn't know its own name without querying
the nameserver. This often means that you haven't entered the
name into /etc/hosts, but more likely, you use a search order
of 'hosts: dns, files' or only 'hosts: dns' in /etc/nsswitch.conf.
The result ist that syslog tries to resolve the origin of each
log message and thus creates a new query and a new log message
in an endless recursion.
I already tried "host: files dns". It doesn't matter.

Marcel
Michael Voight
1999-08-10 17:14:17 UTC
Permalink
Post by Marcel Janssen
Post by Michael van Elst
Apparently the machine doesn't know its own name without querying
the nameserver. This often means that you haven't entered the
name into /etc/hosts, but more likely, you use a search order
of 'hosts: dns, files' or only 'hosts: dns' in /etc/nsswitch.conf.
The result ist that syslog tries to resolve the origin of each
log message and thus creates a new query and a new log message
in an endless recursion.
I already tried "host: files dns". It doesn't matter.
Marcel
It's not going to matter. It isn't going to be much better if it
continues to get the name out of a host file. The question is why it is
looking in the first place. Your syslog config should not reference a
hostname in the message destination, it should only show a file.
This is NOT a DNS problem. DNS is getting the queries.

Michael

Jim Reid
1999-08-10 12:37:36 UTC
Permalink
Marcel> I'm running BIND 8.2.1 on Solaris 2.6

Marcel> When I start named SYSLOGD and NAMED are using lots of CPU.

Marcel> When I turn on querylogging I get this about 200 times a second
Marcel> 08-Aug-1999 13:57:42.559 XX+/134.188.149.78/78.149.188.134.in-addr.arpa/PTR
Marcel> 08-Aug-1999 13:57:42.561 XX+/134.188.149.78/ts1-st16.oce.nl/A

Marcel> In which 134.188.149.78 is the ip adres and ts1-st16.oce.nl
Marcel> is the hostname running NAMED. Thus it performs about 200
Marcel> queries a second for its own ip/hostname.

Marcel> Can anybody help me?

This behaviour is to be expect if you use syslog for query logging and
you're getting upwards of 200 queries a second. Each query causes
named to send a log entry via syslog() to syslogd, causing a context
switch. syslogd looks at the message, figures out what to do with it
and writes it to some file. It gives up the CPU. Then named gets a
chance to run again (=> yet another process context switch) and 5ms
later it gets another query. So no wonder there's lots of system
activity, process switching and a busy syslogd process.

When there are lots of queries, it's better to set up a file channel
in named.conf for query logging. That way, there's no syslog or
context switching overheads and named can block-buffer the I/O to that
file. Something like this in the logging{} statement will give you the
general idea:

channel query_logging {
file "/home/named/querylogs/qlog"
/* keep up to 7 qlog files of 60 Mbytes */
versions 7 size 60M;
/* log the time of each query */
print-time yes;
};

...
category queries {
/* send all query logs to out query_logging channel */
query_logging;
}

The next problem is to find out what's sending out these queries. top
will show you the busiest processes on your system. Then a system call
analyser like ktrace or truss can prove which of them is battering the
name server.

The + in the querylog entry is a new thing in BIND8.2. It indicates
the request had the recursion desired bit set. This usually indicates
a request from a resolver because name servers are generally able to
recurse for themselves. So it looks as though you've got a runaway
application stuck in an infinite loop doing a forward and reverse
lookup for the local machine.
Michael Voight
1999-08-10 15:29:21 UTC
Permalink
Jim,

Someone else mentioned earlier that it appeared syslog might be
recursively looking up the local hostname. Syslog should not be
recursively doing name lookups. This might be because they told syslog
to log DNS queries. Since syslog is doing a dns query that needs to get
logged to syslog, the cycle never ends.

Michael
Post by Jim Reid
Marcel> I'm running BIND 8.2.1 on Solaris 2.6
Marcel> When I start named SYSLOGD and NAMED are using lots of CPU.
Marcel> When I turn on querylogging I get this about 200 times a second
Marcel> 08-Aug-1999 13:57:42.559 XX+/134.188.149.78/78.149.188.134.in-addr.arpa/PTR
Marcel> 08-Aug-1999 13:57:42.561 XX+/134.188.149.78/ts1-st16.oce.nl/A
Marcel> In which 134.188.149.78 is the ip adres and ts1-st16.oce.nl
Marcel> is the hostname running NAMED. Thus it performs about 200
Marcel> queries a second for its own ip/hostname.
Marcel> Can anybody help me?
This behaviour is to be expect if you use syslog for query logging and
you're getting upwards of 200 queries a second. Each query causes
named to send a log entry via syslog() to syslogd, causing a context
switch. syslogd looks at the message, figures out what to do with it
and writes it to some file. It gives up the CPU. Then named gets a
chance to run again (=> yet another process context switch) and 5ms
later it gets another query. So no wonder there's lots of system
activity, process switching and a busy syslogd process.
When there are lots of queries, it's better to set up a file channel
in named.conf for query logging. That way, there's no syslog or
context switching overheads and named can block-buffer the I/O to that
file. Something like this in the logging{} statement will give you the
channel query_logging {
file "/home/named/querylogs/qlog"
/* keep up to 7 qlog files of 60 Mbytes */
versions 7 size 60M;
/* log the time of each query */
print-time yes;
};
...
category queries {
/* send all query logs to out query_logging channel */
query_logging;
}
The next problem is to find out what's sending out these queries. top
will show you the busiest processes on your system. Then a system call
analyser like ktrace or truss can prove which of them is battering the
name server.
The + in the querylog entry is a new thing in BIND8.2. It indicates
the request had the recursion desired bit set. This usually indicates
a request from a resolver because name servers are generally able to
recurse for themselves. So it looks as though you've got a runaway
application stuck in an infinite loop doing a forward and reverse
lookup for the local machine.
Volker Borchert
1999-08-10 14:57:17 UTC
Permalink
In article <802.934288656 at kludge.mpn.cp.philips.com>, Jim Reid <jim at mpn.cp.philips.com> writes:

|> This behaviour is to be expect if you use syslog for query logging and
|> you're getting upwards of 200 queries a second. Each query causes
|> named to send a log entry via syslog() to syslogd, causing a context
|> switch. syslogd looks at the message, figures out what to do with it
|> and writes it to some file.

Some flavors of syslogd do an fsync() call after each and every write
so that chances of losing the last messages before the system finally
crashes are minimized. But fsync() is very expensive.

With some of them, this can be suppressed by inserting a '-' before
the filename in /etc/syslog.conf. With others, you can configure this
in /usr/src/usr.sbin/syslogd/syslogd.c ;-)

|> It gives up the CPU. Then named gets a
|> chance to run again (=> yet another process context switch) and 5ms
|> later it gets another query. So no wonder there's lots of system
|> activity, process switching and a busy syslogd process.
--
"I'm a doctor, not a mechanic." Dr Leonard McCoy <mccoy at ncc1701.starfleet.fed>
"I'm a mechanic, not a doctor." Volker Borchert <bt at teknon.de>
Jim Reid
1999-08-10 15:49:19 UTC
Permalink
Michael> Someone else mentioned earlier that it appeared syslog might be
Michael> recursively looking up the local hostname. Syslog should not be
Michael> recursively doing name lookups. This might be because they told syslog
Michael> to log DNS queries. Since syslog is doing a dns query that needs to get
Michael> logged to syslog, the cycle never ends.

This is a reasonable explanation for such a high query rate of forward
and reverse lookups of the local host name. A quick scan of the
syslogd code suggests it does a forward and reverse lookup for each
message if they arrive via syslogd's listener on port 514. This
doesn't happen if the messages get delivered through /dev/log.
Michael Voight
1999-08-10 16:13:05 UTC
Permalink
I suspect a misconfiguration of syslog. So, further discussion on this
should be directed outside of this BIND group. To BIND, the queries are
valid.
To the original poster: Your syslog entry should ONLY reference a file
path, not a system name.

Michael
Post by Jim Reid
Michael> Someone else mentioned earlier that it appeared syslog might be
Michael> recursively looking up the local hostname. Syslog should not be
Michael> recursively doing name lookups. This might be because they told syslog
Michael> to log DNS queries. Since syslog is doing a dns query that needs to get
Michael> logged to syslog, the cycle never ends.
This is a reasonable explanation for such a high query rate of forward
and reverse lookups of the local host name. A quick scan of the
syslogd code suggests it does a forward and reverse lookup for each
message if they arrive via syslogd's listener on port 514. This
doesn't happen if the messages get delivered through /dev/log.
Loading...