Friday, October 10, 2008


Measuring the length of a linked list

How do you measure the length of a linked list? That's easy enough, you start at the beginning and follow the linked list, counting as you go. Of course, you have to make sure that there's not a cycle in your list, but there's a classic solution to that problem[1].

Okay, so what do you do if this is a linked list in the Solaris kernel, running on a production machine, and you want to know how long it is? Hmm, that's a bit more difficult. The linked list is likely changing, so you can't just pop open mdb, walk the list, and count. It would be convenient to be able to grab the lock on that linked list to guarantee that it doesn't change, and then walk it. Fortunately, there aren't any tools to do that, as you could bring a box to its knees while doing this (performance wise, if not actually crashing the box.) (Theoretically, you might be able to do it with "mdb -kw" if you happen to get lucky setting the right bit at just the right time to make it look as if that lock is held, but I wouldn't be willing to bet on someone actually succeeding at doing this.)

So here's a method that a colleague and I came up with to give an estimate of the maximum and average length of that linked list over a sampling period. (In this case, we're looking at a hash table, the sleep queue used for threads who call cv_block() on a condition variable.):

#!/usr/sbin/dtrace -s


        * Hash function used for this hash table.  I'm adding 1 so I can
        * reserve 0 as a special value.
       self->bucket = (((uintptr_t)(arg0) >> 2) + ((uintptr_t)(arg0) >> 9) & 511) + 1;

       @r[self->bucket] = max(length[self->bucket]);
       @q[self->bucket] = avg(length[self->bucket]);
       bucket[arg1] = self->bucket;
       self->bucket = 0;

/ length[bucket[arg1]] > 0 /

       trunc(@q, 30);

Ignoring some of the particulars, we're keeping a length value for each hash bucket. When we insert something, we increment that value. When we delete something, we decrement it. We then keep track of the max and average for that length. Simple enough. We also keep track of which bucket this thread is going into (arg1 to both sleepq_insert() and sleepq_unlink() is a pointer to the thread structure) so that we know which length variable to decrement. (Note that we can't do this as a thread-local variable because sleepq_insert() and sleepq_unlink() won't happen in the same thread.)

Something to point out is that this is only an approximation to the length of any of these linked lists. What this actually tells you is the maximum and average growth in the length of those linked lists. It won't include the length of any of those lists at the time that the script starts sampling. But as an approximation, it's good enough, especially in this case, because the linked lists in a hash table should never really be longer than one or two and certainly never as long as, say, 83.

[1] Okay, if that link has since become broken, it points to this comment from the code for DTrace:

 * We want to have a name for the minor.  In order to do this,
 * we need to walk the minor list from the devinfo.  We want
 * to be sure that we don't infinitely walk a circular list,
 * so we check for circularity by sending a scout pointer
 * ahead two elements for every element that we iterate over;
 * if the list is circular, these will ultimately point to the
 * same element.  You may recognize this little trick as the
 * answer to a stupid interview question -- one that always
 * seems to be asked by those who had to have it laboriously
 * explained to them, and who can't even concisely describe
 * the conditions under which one would be forced to resort to
 * this technique.  Needless to say, those conditions are
 * found here -- and probably only here.  Is this the only use
 * of this infamous trick in shipping, production code?  If it
 * isn't, it probably should be...

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?