Wednesday, November 29, 2006


Recommended reading: Postmortem Object Type Identification

This paper makes for some interesting reading: Postmortem Object Type Identification by Bryan Cantrill.

The paper presents a method for determining the type of arbitrary memory objects in a core dump. In other words, given some random address in a core dump, what's the type of the object?

(And I guess that introduces the question, why should you care what the type of a random address in memory is? One very good reason is that if you're trying to track down a memory corruption problem, knowing the type of the objects near the memory corruption can help narrow down your search for the bad code.)

Determining the type of statically allocated objects is fairly easy -- you have the symbol table and type information. If the random address happens to match something in the symbol table, you have all the information you need. (I'm oversimplifying a bit, but it's an easy problem.)

The harder problem is determining the type of a dynamically-allocated object. You don't have a symbol table handy to tell you what all the locations in memory are, because you didn't have this information handy at compile time. You could store information at run-time about the types of objects that you're allocating, but that becomes a hairy problem. The likeliest solution would involve modifying your memory allocation library to store this information, but you would need to pass type information to the memory allocation routine, which might not be feasible. (Although it should be noted that the kernel slab allocator in Solaris provides some of this information, as objects allocated from certain object caches are of known type.)

This paper presents a method for inferring the types of dynamically-allocated objects. At the core of this method is a fairly standard iterative graph-traversal algorithm for propagating information from nodes (i.e., memory objects) of known type to nodes of unknown type. Given that almost all dynamically-allocated objects are rooted in statically-allocated objects, the algorithm can provide very good coverage of dynamically-allocated objects. (And the implementation in MDB makes use of the object-cache type knowledge mentioned above as an optimization during initialization.)

The C language allows for uses that reduce the effectiveness of the algorithm, but the paper presents some heuristics to handle those. The paper also presents some interesting applications of this method that aren't directly related to debugging memory corruption.

Definitely worth reading.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?