Debugging SMF problems using svcs -x

Build 69 (Beta7) substantially revises the way in which Solaris starts up. /etc/rc2.d, /etc/rc3.d, etc. are all now legacy facilities (which means they still work, but are no longer preferred). We’ve moved to a much more powerful, flexible model. If you haven’t heard anything about SMF yet, here are some resources:

One feature in SMF of which I am personally proud is svcs -x, which I helped to design.
This feature allows you to ask the question what’s wrong with my system? and allow SMF to work
that out for you. This eases the debugging burden; in some earlier versions of SMF, debugging required
a fair amount of knowledge about the construction of SMF itself; this looked like Michael Hunter’s experience.
Michael described a situation in which a machine which was supposed to be a NIS client (as
well as a NIS master) wasn’t working. At bootup you will see messages like this:

Oct 29 16:45:48 svc.startd[100004]: svc:/network/nis/server:default: Method
"/lib/svc/method/yp" failed with exit status 96.
[ network/nis/server:default misconfigured (see 'svcs -x' for details) ]

Ok! Let’s do what it says:

$ svcs -x
svc:/network/nis/server:default (NIS (YP) server)
State: maintenance since Fri Oct 29 16:45:48 2004
Reason: Start method exited with $SMF_EXIT_ERR_CONFIG.
See: ypstart(1M)
See: ypserv(1M)
Impact: 0 services are not running.

There are a couple of things to notice here.

  • Simplified output. We try to tell you in plain language what is going on.
  • Documentation references. Perhaps you don’t know much about the NIS server; reading
    the man pages can be helpful.

  • Online knowledgebase. Try clicking the link included in the output, and you’ll see
    what I mean. These articles will get richer over time, as well.

  • An assessment of the impact of this problem. In this case, the system has assessed
    that the NIS server failure isn’t impacting any other system services (although it
    might well be impacting remote NIS clients!).

Ok. All we have to do now is locate the log file for this service (in a
future build we expect to have the log file location printed in the svcs -x
output as well), and see what’s going on:

$ tail /var/svc/log/network-nis-server:default.log
[ Oct 29 16:45:48 executing start method ("/lib/svc/method/yp") ]
/lib/svc/method/yp: domain directory missing
[ Oct 29 16:45:48 Method "start" exited with status 96 ]

Problem solved. Here is one more, slightly more interesting example.
For some reason, the NFS server isn’t working. Why not?
In this case I’ve passed the -v option, indicating that we’d
like a more verbose output:

$ svcs -xv
svc:/network/nfs/nlockmgr:default (NFS lock manager)
State: disabled since Fri Oct 29 17:07:34 2004
Reason: Disabled by an administrator.
See: man -M /usr/man -s 1M lockd
Impact: 1 service is not running:

Alternatively, we could ask the system specifically about the NFS server:

$ svcs -x nfs/server
svc:/network/nfs/server:default (NFS server)
State: offline since Fri Oct 29 17:07:49 2004
Reason: Service svc:/network/nfs/nlockmgr:default is disabled.
See: nfsd(1M)
Impact: 0 services are not running.

So this helpfully answers the question: because network/nfs/nlockmgr has
been disabled by an administrator, perhaps mistakenly, the NFS server cannot
start. The remedy is to enable the lock manager.

There is a lot more we’ll be able to do with ‘svcs -x’ moving forward.
Please give us feedback about whether it helps you to solve systems administration