2010 California Propositions Guide

Back in 2008, the last time we had a general election in California, I posted a simple Guide to 2008 California Ballot Propositions.  For the 2010 election (coming this Tuesday) I have moved this information to a new website, www.propgrok.org.  I also added some features to allow you to share your stances on the propositions via facebook or twitter.  If you find the site useful, please mention the site on facebook or twitter– I’d really appreciate it.

(Please note that propgrok.org is not affiliated with Oracle; views expressed therein are my own).

An irritating bug in bash

Here’s what kernel developers talk about at coffee: bugs in our command shells.  At least, that was the topic the other day when Stephen, Dave and I were complaining about the various troubles we’d had with different command shells.

While others moved to zsh some years ago, I have been a bash user since kicking the tcsh habit.  But for years I have been plagued by a subtle nuisance in bash: sometimes it doesn’t catch terminal window resizes properly.  The result is that command line editing works very poorly until bash finally figures this out. After a while, I worked out that this behavior happens only when the window size change happens while you’re in some application spawned by bash.  So if you’re in an editor like vim, change the window size, and then exit (or suspend) the editor, bash will be confused about the terminal size.

While this has always annoyed me, it never quite reached the threshold for me to do anything about it.  But recently it has been bugging me more and more.  After we returned from coffee, I dug into the bash manual and discovered a little-known option, the checkwinsize builtin.  In a nutshell, you can set this shell option as follows:

    shopt -s checkwinsize

which the bash manual says: If set, Bash checks the window size after each command and, if necessary, updates the values of LINES and COLUMNS. Sounds great!  As an aside, I think that as a modern shell, bash should set this option by default.  (Others think so too).

With much self-satisfaction I set this option and got ready for line editing bliss.  But, no joy.  I checked and rechecked, and finally started using truss, and then DTrace, to try to understand the problem.  After some digging I eventually discovered the following bug in the shell.  Here’s the meat of the writeup I submitted to the bash-bug list:

On Solaris/OpenSolaris platforms, I have discovered what I believe is a
bug in lib/sh/winsize.c.
I discovered with a debugger that the get_new_window_size() function
has no effect on Solaris.  In fact, here is what this file looks like if
you compile it:
$ dis winsize.o
disassembly for winsize.o
section .text
get_new_window_size()
get_new_window_size:     c3                 ret
That's it-- an empty function.  The problem is that the appropriate header
file is not getting pulled in, in order to #define TIOCGWINSZ.
As a result, even with 'shopt -s checkwinsize' set on Solaris, bash
does not check the win size on suspend of a program, or on program
exit.  This is massively frustrating, and I know of several Solaris
users who have switched to zsh as a result of this bug.
I have not tried bash 4.0, but looking at the source code, it appears
that the bug is present there as well.
Fix:
I added an ifdef clause which looks to see if the HAVE_TERMIOS_H define
is set, after the #include of config.h.  If it is, then I #include the
termios.h header file.  This solves the problem, which I confirmed by
rebuilding and dis'ing the function.  I also ran my recompiled bash
and confirmed that it now worked correctly.

Hopefully the bash maintainers will take note and fix this bug.  In the mean time, I’m going to see if we can get the fix for this applied to the Nevada (and hence, OpenSolaris) version of bash.

Update: The bash maintainers have fixed this bug in the following patch to bash 4.x.  Hurray!

Speeding to a Halt

<!– –>On Sunday I committed changes into Nevada build 110 (and hence, into OpenSolaris 2009.next) to improve the speed at which OpenSolaris systems shutdown.  On a generic x86 test system, it used to take about 41 seconds to shut the system down starting from the time you typed init 6, shutdown(1m), or pressed the shutdown button in GNOME.  This form of shutdown is known as gentle shutdown because SMF takes care to stop each software service in dependency order.  In contrast, when you use reboot, halt, or poweroff, you’re using what I’ll call violent shutdown.  In the latter, the boot archive is brought up to date and that’s about it.  It has traditionally been much, much faster than gentle shutdown.

This relatively long shutdown time has had an interesting effect: Solaris developers almost universally cheat, and use violent shutdown.  Typing reboot is terribly ingrained in my fingertips.  This is bad, because it means less test coverage for the shutdown method which actually matters to customers.  I recently began to be bothered by this because the GNOME gui uses init 6, and so shutting down via the nice shiny "Shut Down" button is also terribly slow.

On something of a whim, I dusted off our variant of the bootchart project which we ported in-house several years ago to get some initial information on what was happening during system shutdown.  Click here to see a graphical representation of a typical system shutting down (note: big image).  To read the image, note that time is on the X axis, from left to right.  Processes are represented as horizontal bars representing their duration.  At the rightmost side of the chart, the system has stopped.

In the image I’ve highlighted a couple of points of interest:

  • The pppd shutdown script seems to sleep 1, always, even if you aren’t using ppp; since pppd isn’t converted to SMF (bug 6310547), we will try to stop it on all systems on every shutdown.
  • The wbem service seems to sleep 1 while shutting down, and the the webconsole service takes a while to shutdown.  However, these services are present only on Nevada, and not on OpenSolaris, so I chose not to pursue trying to fix them.
  • The deferred patching script, installupdates is really slow.  And needlessly so– it can run in a few milliseconds with a simple fix; I filed a bug.
  • There are some long calls to /usr/bin/sleep.  In the chart linked above, you can see svc-autofs, rpc-bind, and svc-syseventd each taking five seconds to stop.  Five seconds is a really long time!
  • There’s a call to something called killall near the end of shutdown.  Then, 5 seconds later, another.  Then, 10 seconds later, things proceed again.  I wondered what the killall was all about?  Did it really need 15 seconds to do its work?

After a bit of effort (ok, a lot of effort), I’ve cleaned up these, and some other problems I spotted along the way.  It turns out that the five second delays are from some poor quality shell code in /lib/share/smf_include.sh:

smf_kill_contract() {
...
# Kill contract.
/usr/bin/pkill -$2 -c $1
...
# If contract does not empty, keep killing the contract to catch
# any child processes missed because they were forking
/usr/bin/sleep 5
/usr/bin/pgrep -c $1 > /dev/null 2>&1
while [ $? -eq 0 ] ; do
...

Ugh. So this shell function rather bizarrely always waits five seconds, even if the contract empties out in 1/10th of a second!  I fixed this to have a smarter algorithm, and to keep checking at more frequent intervals (every 0.2 seconds).

I discovered that the calls to killall(1m) were really ancient history, and probably did not need to occupy 15 seconds worth of system shutdown.  I have shortened the interval substantially.

Another problem we faced was that, in the last moments before shutdown, startd runs some commands using system(3c).  This can be a problem if one of those commands, for some reason, wedges up.  So, I’ve replaced the calls to system with calls which timeout after a set number of seconds.  This is some nice insurance that the system continues to make progress as it shuts down.  Since I wound up with so much extra time available at shutdown, I’ve taken the chance to add a call to lockfs(1m) in an effort to get as much on-disk UFS consistency as possible.

So, here is the intermediate picture.  I’ve slightly revised the shutdown messaging, as well, to include a datestamp and a measurement of how long the shutdown took:

    svc.startd: The system is coming down.  Please wait.
svc.startd: 83 system services are now being stopped.
Mar  5 19:43:34 soe-x4100-3 syslogd: going down on signal 15
svc.startd: Killing user processes.
Mar  5 19:43:40 The system is down.  Shutdown took 17 seconds.

But wait, there’s more!  On OpenSolaris, we don’t have the time consuming wbem or webconsole services.  So, we can disable those and try again.  And, we use ZFS, for which the time consuming lockfs call at the end of shutdown is a no-op (on UFS, it takes at least two seconds).  This slimmed down stack results in an impressive shutdown time:

    Mar  6 02:51:51 The system is down.  Shutdown took 7 seconds.

And here is what it looks like.  If you want to see the complete set of changes, the codereview is also available.  As you can see, revising the way we kill off processes at the end of the life of the system is the big unrealized win.  And doing so would likely shave about 3 more seconds off, for a gentle shutdown of 4-5 seconds.  I ran out of time to do that this time around.

Some caveats:

  • You mileage may vary: you might run a different mix of services on your system, and perhaps one of those has a slow shutdown method which will gum up the works.  If you want to test how long a service takes to stop, try ptime svcadm disable -s <servicename>.
  • Your performance improvement is likely to be less dramatic on systems with less available parallelism.  Most of my test systems have two or four CPUs.

I should add a coda here: this work is greatly improved by recent bootadm performance work by Enrico.  While building the boot archive is still sometimes triggered on shutdown, it takes a lot less time than it did previously.

I had a good time working on this project; I hope you’ll enjoy using it.

California Proposition Guide

I grew up in Pennsylvania, a state without voter propositions.  Upon arriving in California some years ago, I found the "prop" system to be complicated and kind of a big pain.  Over the years I’ve taken to making myself a chart of what the various useful (to me) sources are saying about the propositions.  This time I am sharing it here as I hope it will be useful to others.  Any mistakes are probably the result of transcription error on my part.  I’ll be happy to correct them.  In compiling this, I picked my region’s two largest papers, plus the LA Times.  You’ll also find the recommendation of the California League of Women Voters and the "party lines" of the two major parties.  Whatever you do: go out and vote!

Useful Links:

Prop Pro / Con / BallotPedia CA LWV SF Chron SJ Merc LA Times CA Dem Ca GOP
1A High Speed Rail www.californiahighspeedtrains.com
www.derailhsr.com
ballotpedia:Proposition_1A
Yes Yes Yes Yes Yes No
2 Standards for Confining Farm
Animals
yesonprop2.com
www.safecaliforniafood.org
ballotpedia:Proposition_2
No Stance No Yes No Yes No
3 Children’s Hospital Bond Act. Grant Program. www.imaginewithus.org
??
ballotpedia:Proposition_3
Yes Yes Yes Yes Yes No
4 Abortion Waiting Period/Parental Notification www.yeson4.net
http://www.noonprop4.org/
ballotpedia:Proposition_4
No No No No No Yes
5 Nonviolent Drug Offenses. Sentencing, Parole and Rehabilitation. www.prop5yes.com
http://www.noonproposition5.com/
ballotpedia:Proposition_5
Yes No No No Yes No
6 Police and Law Enforcement Funding. Criminal Penalties and Laws. (“Runner Initiative”) www.safeneighborhoodsact.com
www.votenoprop6.com
ballotpedia:Proposition_6
No No No No No Yes
7 Renewable Energy Generation www.yeson7.net
www.noprop7.com
ballotpedia:Proposition_7
No No No
No No No
8 Eliminates Right of Same-Sex Couples to Marry www.protectmarriage.com
www.noonprop8.com
ballotpedia:Proposition_8
No No No No No Yes
9 Criminal Justice System. Victims’ Rights. Parole. www.friendsofmarsyslaw.org
www.votenoprop9.com
ballotpedia:Proposition_9
No No No No No Yes
10 Alternative Fuel Vehicles and Renewable Energy. Bonds. www.prop10yes.com
www.noonproposition10.org
ballotpedia:Proposition_10
No No No No Neutral No
11 Redistricting Reform: California Voters FIRST Act yesprop11.org
www.noonprop11.org
ballotpedia:Proposition_11
Yes Yes Yes Yes No Neutral
12 Veterans’ Bond Act of 2008 ???
???
ballotpedia:Proposition_12
No Stance Yes Yes Yes Yes Yes

BigAdmin Updates Zones Page

The zones team recently identified was that our web presence has been lacking.  A quick survey surprised us: the relatively unloved BigAdmin: Solaris Containers page generates a lot of daily visits.  We realized that this is because it is the top search result if you google for "Solaris Zones".  So Penny and Robert (one of the BigAdmin maintainers) set out to make improvements.  The new page is now posted.  Take a look!  Also, if you have suggestions for further improvements, or pointers to materials we should add, please leave them in my blog’s comments section.

Blastwave on Solaris 8 Containers

I just got word of a great new Solaris 8 Containers success story on sun.com, from blastwave.org.  I think my favorite quote from Dennis Clarke (Founder of Blastwave) is:

"I virtualized critical Solaris 8 production servers and nobody
noticed
.  I literally shut the server down, backed it up, created
a Solaris 8 Container, restored the environment, and brought the server
back up. The process was simple, transparent, and completely flawless."

Our team did a lot of challenging work to ensure this sort of customer experience.  It’s gratifying to see it pay off for an important contributor to the Solaris ecosystem.

So if you’ve been trying to persuade your PHB that you should give Solaris 8 Containers a try, here is a great reference!