More on Bootchart for Solaris

So Eric Let the cat out of the bag
on our reworked BootChart for Solaris. Eric
posted one image
; here is another: the bootup
of a single Solaris Zone
. I’m pretty happy with this, as we boot in only about 7 seconds.

This was an interesting experience because Eric and I had not previously worked together very closely.
I had a great time doing this, and because Eric immediately stuck everything into a
Teamware workspace, we were able to work
simultaneously. Eric and I both worked on the D scripting, and somehow a joke about “wouldn’t it
be nice if the script output XML?” turned into our strategy. This turned out to be a good decision,
as I hate writing parsers; instead we just let SAX do the work. We were able to maintain the split of having boot-log generation in one self-contained
component, and the log processing into another. Because the XML logs are in an easily parsed format
(as opposed to parsing the output of top and iostat), they can be useful to
anyone doing boot analysis on Solaris. We’ve already had some such requests. I’m sure Eric
will have more to say about the implementation so I’ll leave it to him, except to say that some
of the visual design changes can be blamed on me, taking inspiration from
Edward Tufte’s work.

Something else which fell out of this experience is that it’s easy to use the log gatherer on any collection
of processes which start up (as we saw in the zones example, above). We hope that this will be helpful
in highlighting performance wins in the startup of complex ISV programs.

Following the experience of Linux developers, we’ve also found a series of bugs in Solaris with this tool.
Let’s start with an easy one, and the first one I found. The bug is visible in this chart from xanadu,
Shuttle SN45G4.
we don’t have support (sadly) for the on-board ethernet on this box, I had inserted another network card
(I won’t name the vendor, as I don’t want to put them on the spot). If you look carefully at
this bootchart, you’ll see that the ifconfig process is spending a lot of time
on the CPU. What’s up with that? A brief investigation with DTrace made it clear that
the driver had code like the following (shown here reduced as pseudo-code) in the
attach(9E) codepath (attach is the process by which a driver begins controlling a
hardware device):

for (i := 0 to auto_negotiation_timeout) {
if auto_negotiation_complete()
return success;

Which all looks fine except that wait_milliseconds() (a function defined by the driver) is a wrapper around
(“busy-wait for specified interval”). Busy is of course the problem. drv_usecwait
is really more about waiting for short intervals for various information to become ready in
various hardware registers. Busy-waiting 100 milliseconds at a time is practically forever, and
ties up the CPU just spinning in a loop. The authors almost certainly meant to
use delay(9F). I filed
a bug, and hopefully we’ll have it fixed soon (since this driver comes to us from a third party,
they request that we let them make the code changes). Fun, eh?

Another two issues we spotted concern inetd, which has been rewritten from scratch in
Solaris 10; it is now a delegated restarter, which basically means that it take some of
its direction from the system’s master restarter (svc.startd). The behavior we noticed
sticks out on any of the boot charts, including the
zone boot chart
mentioned above: inetd is
starting a lot of very short-lived ksh processes. Why? When I first spotted this,
I used DTrace to work out the answer, as follows:

# dtrace -n 'proc:::create/execname=="inetd"/{ustack();}'
restart inetd
0  12188                     cfork:create`__fork1+0x7`wordexp+0x16f

Ahh, so we stumble upon an (embarrassing) implementation artifact of libc— it uses
ksh to help it implement the libc routine wordexp(3c). So, every time inetd needs
to wordexp() something, we wind up running a ksh. We can also see that this is not severely
impacting performance, but we would like to get this fixed. Personally, I’d like to see
wordexp() fixed to not rely upon ksh at all.

Another somewhat more subtle issue is something that SMF engineers like Liane are still looking at. It’s also visible in
the zone boot chart. It appears that some services (such as nfs/client) are delayed
in coming on-line because it is taking the startd/inetd cabal a while to
mark their dependent services as online, even though that shouldn’t really entail much work.
We can see this as follows:

$ svcs -l nfs/client
fmri         svc:/network/nfs/client:default
name         NFS client service
dependency   optional_all/none svc:/network/rpc/keyserv (online)
dependency optional_all/none svc:/network/rpc/gss (online)
dependency require_all/refresh svc:/milestone/name-services (online) $ svcs -l network/rpc/gss fmri svc:/network/rpc/gss:default name Generic Security Service enabled true state online next_state none state_time Fri Jan 07 18:22:32 2005
restarter svc:/network/inetd:default
dependency require_all/restart svc:/network/rpc/bind (online) dependency optional_all/none svc:/network/rpc/keyserv (online)

Since nfs/client depends upon GSS, it can’t be started by startd
until GSS is online.
Liane and Jonathan
have offered up some theories about why this is happening, but we’ve all been engaged on
higher priority work, and so we haven’t had much time yet to dig deeper.
This serialization point appears to be costing us roughly 1 full second on boot, so
it’s something we need to look at further. Have a great weekend folks!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s