Sunday, June 04, 2006
Non-SMF restarter
In Solaris 10, the Service Management Facility (SMF) proveides some nice fault-tolerance features. If a service is killed for whatever reason, SMF will restart it (assuming you've configure SMF to do so for the service in question.) There may be cases in which you can't move a service under SMF, but you'd like a service to be restarted when it dies. For this, we have ctrun(1).
At its core, ctrun is very simple: it creates a process contract and runs a specified command in that process contract. (I wrote a little about process contracts here, and there are always the man pages (contract(4), process(4), et al.)) It can also act as a restarter for the process if you tell it to. For example,
So why do we need ctrun if we have SMF? Well, for one, you may not be the administrator of the system you want to run a restarting daemon on. Or it might not be a daemon, it might simply be a long-running calculation that you want to start on Friday afternoon before you leave for the weekend (and that you've written in such a way that it frequently saves state so that it pick up close to where it was when it terminated.) Or you might be the administrator, and you might be working with production daemons, but you might have an in-house written rc system that you're not willing to scrap to move everything under SMF (especially given that you're a mixed Solaris and Linux shop and don't have SMF under Linux.)
At its core, ctrun is very simple: it creates a process contract and runs a specified command in that process contract. (I wrote a little about process contracts here, and there are always the man pages (contract(4), process(4), et al.)) It can also act as a restarter for the process if you tell it to. For example,
ctrun -r 0 -t -f hwerr,core,signal /usr/local/sbin/foodwill run the foo daemon and restart it if dies from a hardware error, if it dumps core, or if it receives a fatal signal. The '-r 0' tells ctrun to attempt to restart it an infinite number of times, and the '-t' tells ctrun to transfer any inherited subcontracts to the new process contract when it restarts food.
So why do we need ctrun if we have SMF? Well, for one, you may not be the administrator of the system you want to run a restarting daemon on. Or it might not be a daemon, it might simply be a long-running calculation that you want to start on Friday afternoon before you leave for the weekend (and that you've written in such a way that it frequently saves state so that it pick up close to where it was when it terminated.) Or you might be the administrator, and you might be working with production daemons, but you might have an in-house written rc system that you're not willing to scrap to move everything under SMF (especially given that you're a mixed Solaris and Linux shop and don't have SMF under Linux.)