Wednesday, December 20, 2006


Unkillable processes

One of the blogs I read religiously is Ben Rockwood's. He has some interesting anecdotes (that's, in case you get the spam warning instead of the blog) about using OpenSolaris in production at Joyent, including one about an unkillable process.

I mailed the link to a couple of former colleagues, mostly because I thought they might be interested in the NFS-over-ZFS anecdote (given that they work at an ISP.) Apparently I jinxed them -- just after getting in to work the next morning, they discovered an unkillable process on one of their Solaris 10 boxes. And it was also a process running in a zone, so it was impossible to reboot the zone to clear it up.

Sorry, guys.

(BTW, this appeared to be a deadlock situation. The process has two threads, one stuck in cv_wait() via exitlwps() and the other stuck in cv_wait() via tcp_close(). Given that I don't work there anymore, I couldn't really go crash-dump diving, but I'd bet that there were no other threads on the system that were going to call cv_signal() or cv_broadcast() on that particular CV.)

Goat entrails to undo the jinx are coming via FedEx.
Interesting anecdote by Ben Rockwood there.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?