Friday, May 26, 2006

 

Process contracts in Solaris 10

The Service Management Facility (SMF) is a new feature in Solaris 10 that's replacing the old System V /etc/rc*.d way of starting and stopping system processes. Of course, it's not just a new way to start and stop system processes, it's a way to manage services. Aside from other features, SMF can restart daemons that are killed for whatever reason. I was discussing this with a colleague, and he wanted to know how this works. Obviously, it could simply use the fact that init(1M) is the parent of all daemons and piggyback on the wait(2) loop that reaps children as they die. But that wouldn't be useful in supporting other features of SMF, such as having delegated restarters.

The communication mechanism that allows the daemon restart and other features of SMF to work is process contracts. A process contract is essentially another relationship between processes, similar to the parent-child relationship between processes in Unix, but also similar to process groups, as contract IDs aren't necessarily unique. Every process has an associated contract ID, and every contract has a contract holder. The contract holder receives information about certain things that happen to processes with that contract ID; namely process creation and normal termination (processes added to or removed from the contract), the last process to exit a contract (the contract is now empty), and other fatal events (a fatal signal, a core file was generated, or the process was killed due to an uncorrectable hardware error.)

One of the major benefits of process contracts is that it allows one process to define a fault boundary around a set of subprocesses. What this means in practice is that you have finer-grained control over the fault boundaries on your server. Prior to Solaris 10, the fault boundary essentially included every process on running on a server. In the case of an uncorrectable memory error, for example, a server would either panic (the error was in kernel address space) or "gracefully" reboot the server (if the error was in user space.) (See more here.) Given that the fault boundary included all processes, this was the only option.

With process contracts, however, there's finer-grained control over the fault boundaries. The boundary can be defined to include, for example, httpd and any of its child processes. If an uncorrectable memory error kills the parent httpd process, SMF can be notified and restart that process. There's no need to reboot the server. And given that SMF allows us to specify dependencies between services, there's no need to do anything with the sendmail process if there's no dependency on httpd.

(This is by no means an exhaustive look at process contracts. More information can be found in the man pages (contract(4), process(4), et al.) and other documentation.)


Comments:
Nice colors. Keep up the good work. thnx!
»
 
Great site loved it alot, will come back and visit again.
»
 
Your are Excellent. And so is your site! Keep up the good work. Bookmarked.
»
 
OK I work for Sun but a great blog keep going.
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?