Wednesday, September 20, 2006
Nifty MDB tidbits
This stuff is actually kind of fun to play with. Yesterday, one of my colleagues was trying to figure out the option to ps to get it to display the actual time of day something was started, when it was started long enough ago that it only displays the month and day. I did some random poking with MDB (output slightly modified for formatting purposes. The 64-bit address is the proc structure.):
And here was a nifty problem that I ran across. We'd lost contact with a Solaris 8 server -- ssh gave a connection refused, and there was no response on the console. Someone dropped it to the ok prompt and ran 'sync' to get a core dump. I was looking at the core dump and noticed this:
Ayup, that would be a problem. If init's a zombie itself, it's not likely to be doing its job of restarting ttymon.
server# mdb -k Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 ufs ip sctp usba fcp fctl qlc lofs md cpc fcip random crypto zfs logindmux ptm nfs ] > ::ps ! grep sshd R 1298 1 ffffffff8166b150 sshd R 6701 1298 fffffe82ca7d98f8 sshd R 6727 6701 fffffe81b3205230 sshd R 25827 1298 fffffe80eb61a6c8 sshd R 25857 25827 fffffe82c7be5a20 sshd R 8479 1298 fffffe82c7bdb988 sshd R 8485 8479 fffffe80ec1296f8 sshd > ffffffff8166b150::print -t proc_t ! grep time [ ... ] time_t tv_sec = 2006 Sep 7 15:00:44 [ ... ] >(This was obviously just random poking and not necessarily the most efficient way to get this information.)
And here was a nifty problem that I ran across. We'd lost contact with a Solaris 8 server -- ssh gave a connection refused, and there was no response on the console. Someone dropped it to the ok prompt and ran 'sync' to get a core dump. I was looking at the core dump and noticed this:
server# mdb -k unix.0 vmcore.0 Loading modules: [ unix krtld genunix ip usba lofs random nfs ptm ] > ::ps -f ! grep ssh Z 428 1 0000030004938048 /usr/local/sbin/sshd >Hmm, sshd is a zombie, so it makes sense that there's no response on port 22. What about the console?
> ::ps -f ! grep ttymon R 1152 1 0000030003f54040 /usr/lib/saf/ttymon Z 1144 1 0000030004cf1540 /usr/lib/saf/ttymon -g -h -p server console login: -T sun -d /de >Hmm, that's odd. If the console ttymon died, init should have restarted it. That's what inittab says. So I look at init:
> ::ps -f ! grep init Z 1 0 0000030001b81528 /etc/init - >
Ayup, that would be a problem. If init's a zombie itself, it's not likely to be doing its job of restarting ttymon.