Sunday, August 20, 2006
Fun with DTrace -- tracing I/O through a pipe
I was tracking down a problem where one possible avenue of exploring what was going on was to use DTrace to track the data being written to and read from a pipe. I didn't go down that path, as I figured out the problem using other methods. But it seemed like a good exercise, so I followed through on it.
This wasn't just a simple case of grabbing the file descriptors returned by pipe() and tracking read()'s and write()'s made by that pid using those file descriptors. There was an intervening fork() and the dup()'s necessary to use the two ends of the pipe for stdin and stdout. I could have tracked the fork() and the dup()'s, but I chose to try it a different way.
A file descriptor is nothing more than an index into the file descriptor table associated with a process. Somewhere between the file descriptor being passed to the read() and write() system calls and the underlying function that actually reads to or writes from that file, the file descriptor must necessarily be mapped to the data structure representing the file. This happens in this file at the top of the read() code (and similarly for write()):
The return value of the getf() function (a struct file *) is what I'm looking for, the pointer to the process-independent data structure that I want to track. Once I have the file pointer from the pipe() system call, I can trace all the reads and writes using that pointer. Given that that value is only determined after entering read() or write(), I'll need to perform some speculation to determine which read()'s and write()'s I'm interested in.
So how do I determine the two corresponding values being used in pipe()? After having allocated two file pointers and two file descriptors, pipe() does the following to associate the file pointers to the file descriptors:
All very logical and straightforward. Based on the above, I wrote the following DTrace script. Note that it's actually not what I represented above, as this code assumes we know the pid of the process calling pipe(). But it's still reasonably representative, as tracing the reads and writes depends solely on the file pointer.
This wasn't just a simple case of grabbing the file descriptors returned by pipe() and tracking read()'s and write()'s made by that pid using those file descriptors. There was an intervening fork() and the dup()'s necessary to use the two ends of the pipe for stdin and stdout. I could have tracked the fork() and the dup()'s, but I chose to try it a different way.
A file descriptor is nothing more than an index into the file descriptor table associated with a process. Somewhere between the file descriptor being passed to the read() and write() system calls and the underlying function that actually reads to or writes from that file, the file descriptor must necessarily be mapped to the data structure representing the file. This happens in this file at the top of the read() code (and similarly for write()):
ssize_t read(int fdes, void *cbuf, size_t count) { [ ... ] if ((fp = getf(fdes)) == NULL) return (set_errno(EBADF));
The return value of the getf() function (a struct file *) is what I'm looking for, the pointer to the process-independent data structure that I want to track. Once I have the file pointer from the pipe() system call, I can trace all the reads and writes using that pointer. Given that that value is only determined after entering read() or write(), I'll need to perform some speculation to determine which read()'s and write()'s I'm interested in.
So how do I determine the two corresponding values being used in pipe()? After having allocated two file pointers and two file descriptors, pipe() does the following to associate the file pointers to the file descriptors:
setf(fd1, fp1); setf(fd2, fp2);
All very logical and straightforward. Based on the above, I wrote the following DTrace script. Note that it's actually not what I represented above, as this code assumes we know the pid of the process calling pipe(). But it's still reasonably representative, as tracing the reads and writes depends solely on the file pointer.
syscall::pipe:entry / pid == $target / { printf("Pid %d called pipe()\n", pid); self->trace_pipe = 1; self->first_fp = 0; } /* * Grab the file pointer from the second call to setf() within pipe(). Note * that we want to do this first to avoid hitting this predicate just after * we've set self->first_fp. */ fbt::setf:entry / self->trace_pipe && self->first_fp != 0 / { printf("fd == %d second fp == 0x%x\n",arg0,arg1); self->second_fp = arg1; } /* * Grab the file pointer from the first call to setf() within pipe(). */ fbt::setf:entry / self->trace_pipe && self->first_fp == 0 / { printf("fd == %d first fp == 0x%x\n",arg0,arg1); self->first_fp = arg1; } syscall::pipe:return / self->trace_pipe / { self->trace_pipe = 0; } /* * Note the speculation. On entry to these system calls, we only have a file * descriptor. We commit the speculation when we know that the fd maps to the * file pointer of interest. */ syscall::write:entry, syscall::read:entry { self->spec = speculation(); speculate(self->spec); printf("%s %d bytes %s fd %d\n",probefunc,arg2, (probefunc == "write" ? "to" : "from" ),arg0); } fbt::getf:return / self->spec && ( arg1 == self->first_fp || arg1 == self->second_fp ) / { commit(self->spec); self->spec = 0; } fbt::getf:return / self->spec && arg1 != self->first_fp && arg1 != self->second_fp / { discard(self->spec); self->spec = 0; }