Friday, December 01, 2006
Turnstiles and MDB
In Solaris, turnstiles are a data structure used by some of the synchronization primitives in the kernel (mutexes and reader-write locks, specifically.) They're similar to sleep queues, but they also deal with the priority inversion problem by allowing for priotiy inheritance.
(Priority inversion occurs when a high-priority thread is waiting for a lower-priority thread to release a resource it needs. Priority inheritance is a mechanism whereby the lower-priority thread gets raised to the higher priority so that it can release the resource more quickly.)
There's more information in the Solaris Internals book about turnstiles, but I wanted to discuss looking at turnstiles with MDB. The ::turnstile dcmd will list all of the turnstiles on your live system or in your crash dump. For example:
You get the addresses of the turnstile and the synchronization object associated with it, the number of waiters, and priority information. So, let's look at the turnstiles with waiters:
We have the addresses of the synchronization objects, so let's look at one (I happen to know that these are all reader-writer locks):
We can see who the owner is (the address of the data structure representing the thread), the value of the flags, and the list of waiters (if any.) We know this is currently being held as a write lock because the WRITE_LOCKED flag is 1, but also because the OWNER/COUNT lists the address of a thread rather than a count of readers.
And given the owner, we can examine the stack:
So this thread is holding a reader-writer lock, and it appears to be waiting on a condition variable. As it turns out, nothing is ever going to call cv_broadcast() or cv_signal() on that condition variable, which means that the process is never going to release that RW lock, either. Which is, of course, why I'm looking at this crash dump in the first place.
(Priority inversion occurs when a high-priority thread is waiting for a lower-priority thread to release a resource it needs. Priority inheritance is a mechanism whereby the lower-priority thread gets raised to the higher priority so that it can release the resource more quickly.)
There's more information in the Solaris Internals book about turnstiles, but I wanted to discuss looking at turnstiles with MDB. The ::turnstile dcmd will list all of the turnstiles on your live system or in your crash dump. For example:
> ::turnstile ! head ADDR SOBJ WTRS EPRI ITOR PRIOINV ffffffff81600000 0 0 0 0 0 ffffffff81600040 0 0 0 0 0 ffffffff81600080 0 0 0 0 0 ffffffff816000c0 0 0 0 0 0 ffffffff81600100 0 0 0 0 0 ffffffff81600140 0 0 0 0 0 ffffffff81600180 ffffffff88b3fd48 0 165 0 0 ffffffff816001c0 ffffffff812bad80 0 60 0 0 ffffffff81600200 ffffffff852c5f98 0 60 0 0 >
You get the addresses of the turnstile and the synchronization object associated with it, the number of waiters, and priority information. So, let's look at the turnstiles with waiters:
> ::turnstile ! awk '$3 != 0' ADDR SOBJ WTRS EPRI ITOR PRIOINV ffffffff812e3748 ffffffff8c1f9570 2 164 0 0 ffffffff887b8340 ffffffff8e3ea688 1 165 0 0 ffffffff8193a980 ffffffff8d8f86f8 6 164 0 0 fffffe84cd15fe08 ffffffff8e3ea680 2 164 0 0 >
We have the addresses of the synchronization objects, so let's look at one (I happen to know that these are all reader-writer locks):
> ffffffff8c1f9570::rwlock ADDR OWNER/COUNT FLAGS WAITERS ffffffff8c1f9570 ffffffff92480380 B111 ffffffff8888f7e0 (W) ||| ffffffffb0cea1e0 (W) WRITE_LOCKED ------+|| WRITE_WANTED -------+| HAS_WAITERS --------+ >
We can see who the owner is (the address of the data structure representing the thread), the value of the flags, and the list of waiters (if any.) We know this is currently being held as a write lock because the WRITE_LOCKED flag is 1, but also because the OWNER/COUNT lists the address of a thread rather than a count of readers.
And given the owner, we can examine the stack:
> ffffffff92480380::findstack stack pointer for thread ffffffff92480380: fffffe8000dc04b0 [ fffffe8000dc04b0 _resume_from_idle+0xde() ] fffffe8000dc04e0 swtch+0x10b() fffffe8000dc0500 cv_wait+0x68() fffffe8000dc0550 top_end_sync+0xa3() fffffe8000dc05f0 ufs_write+0x32d() fffffe8000dc0600 fop_write+0xb() fffffe8000dc0890 rfs3_write+0x3a3() fffffe8000dc0b50 common_dispatch+0x585() fffffe8000dc0b60 rfs_dispatch+0x21() fffffe8000dc0c30 svc_getreq+0x17c() fffffe8000dc0c80 svc_run+0x124() fffffe8000dc0cb0 svc_do_run+0x88() fffffe8000dc0ed0 nfssys+0x50d() fffffe8000dc0f20 sys_syscall32+0xef() >
So this thread is holding a reader-writer lock, and it appears to be waiting on a condition variable. As it turns out, nothing is ever going to call cv_broadcast() or cv_signal() on that condition variable, which means that the process is never going to release that RW lock, either. Which is, of course, why I'm looking at this crash dump in the first place.