To timidly go where many have gone before: January 2007

Thursday, January 25, 2007

Microstate accounting in Solaris 10 and CPU latency

Solaris 10 included some pretty major features such as ZFS, Zones and DTrace. There were also other features that weren't quite as major but that are pretty nifty in and of themselves. One of these features is that microstate accounting is turned on by default in Solaris 10. (It was optional in earlier releases of Solaris.) Eric Schrock blogged about microstate accounting here.

Using 'prstat -m' you can see some of this information. For example:

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
  9559 mynhier   27 3.9 0.0 0.0 0.0 0.0 0.0  69   0 186  3K   0 ube/1
  9562 mynhier   18 6.6 0.0 0.0 0.0 0.0 0.0  76   0  92  7K   0 ube/1
  9555 mynhier   21 2.7 0.0 0.0 0.0 0.0 0.0  76   0 143  2K   0 cc1/1
  9564 mynhier   14 8.8 0.0 0.0 0.0 0.0 0.0  77   0 124 13K   0 ube/1
  9551 mynhier   18 2.2 0.0 0.0 0.0 0.0 0.0  80   0 123  2K   0 cc1/1
  9549 mynhier   17 2.1 0.0 0.0 0.0 0.0 0.0  81   0 117  2K   0 cc1/1
  9560 mynhier   16 2.6 0.0 0.0 0.0 0.0 0.0  82   0 350  6K   0 iropt/1
  9507 mynhier  5.9 1.6 0.0 0.0 0.0 0.0  59  34  14  31  2K   0 dmake/1
  9508 mynhier  3.4 1.0 0.0 0.0 0.0 0.0  75  21  14  17  1K   0 dmake/1
  9489 mynhier  2.9 0.7 0.0 0.0 0.0 0.0  96 0.8  15  13  1K   0 dmake/1
  9490 mynhier  2.9 0.7 0.0 0.0 0.0 0.0  96 0.8  15  13  1K   0 dmake/1
   479 mynhier  0.7 0.0 0.0 0.0 0.0 0.0  98 1.7  28   0 113  13 Xorg/1
  9543 mynhier  0.1 0.4 0.0 0.0 0.0 0.0  67  32   7   1 215   0 cc/1
  9536 mynhier  0.1 0.4 0.0 0.0 0.0 0.0  79  20   6   3 207   0 cc/1
  9527 mynhier  0.1 0.4 0.0 0.0 0.0 0.0  95 4.4   6   5 203   0 cc/1
  9554 mynhier  0.1 0.2 0.0 0.0 0.0 0.0  86  14   2   0 221   0 gcc/1
  9550 mynhier  0.1 0.2 0.0 0.0 0.0 0.0  85  15   2   0 180   0 gcc/1
  9548 mynhier  0.1 0.2 0.0 0.0 0.0 0.0  84  16   1   3 163   0 gcc/1
  9546 mynhier  0.1 0.2 0.0 0.0 0.0 0.0  87  12   3   0 200   0 cc/1
  9544 mynhier  0.1 0.2 0.0 0.0 0.0 0.0  83  16   1   1 239   0 sh/1
Total: 111 processes, 238 lwps, load averages: 5.46, 5.79, 5.68
>

The additional columns (TRP through LAT) indicate what percentage of time the process spent handling traps, text page faults, data page faults, sleeping or waiting for CPU.

Given that this is only a two-processor machine, it's fairly obvious from the load average reported above that the machine is overloaded, but the load average doesn't give quite as detailed picture of what's going as does 'prstat -m' output. In particular, the LAT column (CPU latency) is pretty instructive. All of the processes in the above example are spending some percentage of their time fully able to run but waiting to be switched onto a CPU. You could have guessed this from the load average, but load average couldn't have told you that some of these processes were spending over 80% of their time sitting in a run queue.

# posted by Chad Mynhier @ 2:45 PM 2 comments

Saturday, January 20, 2007

It's the little things

Guh. There's nothing quite as embarassing as having made a simple formatting mistake in your blog and then having propagated it for a couple of months.

I just recently realized that this really only breaks how my blog appears in Firefox. Given that I use Opera, Safari, and IE, and given that they all displayed what I wanted to see, I never realized the mistake. So now I just republished two months' worth of entries to fix it.

Guh

# posted by Chad Mynhier @ 1:46 PM 3 comments

Monday, January 08, 2007

ZFS and committing data to stable storage

I ran across a couple of blog entries that might be of interest to anyone using ZFS on an NFS server or using ZFS on a storage array with a battery-backed cache.

WRT NFS over ZFS, this entry discusses why you might see worse performance with NFS over ZFS when compared to NFS over UFS (at least for a singly-threaded load.) The article points out that it's actually an apples-to-oranges comparison, as ZFS implements the correct behavior WRT NFS semantics -- NFS COMMITs are guaranteed to be on stable storage (via the ZFS intent log (ZIL)), regardless of whether the disk cache is enabled. With UFS (or other filesystems), NFS COMMITs are considered successful once they've hit the disk cache, which is decidedly not stable storage.

On a similar note (and linked from the above, also), this entry discusses ZFS performance problems with storage arrays using battery-backed cache. In this case, the cache can be considered stable storage, but ZFS still forces a flush of the cache after every write to the ZIL. The article discusses a couple of ways to handle this problem. One of these is definitely a bad idea (disabling the ZIL), but the other has merit (instructing the storage array to ignore the flush commands from ZFS.)

# posted by Chad Mynhier @ 5:40 AM 0 comments

Thursday, January 04, 2007

Machine That Goes PING!

I was looking through the source code for ping (the Solaris version, and I noticed this:

    496  if (getenv("MACHINE_THAT_GOES_PING") != NULL)
    497   stats = _B_TRUE;

So if you want the "-s" functionality without always having to specify the "-s", set this environment variable.

This fix doesn't appear to have been integrated before Solaris 10 06/06 was set in stone, though, and I don't have access to an 11/06 or recent Solaris Express installation to see it in action. It's not terribly Earth-shaking as code changes go, I just found it amusing.

# posted by Chad Mynhier @ 10:36 AM 1 comments

Wednesday, January 03, 2007

ZFS stores file creation time

I've probably run across this at some point and forgotten it, but I happened to notice it while looking at something else. ZFS stores the file creation time (crtime) in addition to the traditional atime, mtime, and ctime:

juser@server> sudo zdb -dddd trashme
[ ... ]
    Object  lvl   iblk   dblk  lsize  asize  type
        14    2    16K   128K  3.75M  3.75M  ZFS plain file
                                 264  bonus  ZFS znode
        path    /foo
        atime   Wed Jan  3 16:20:42 2007
        mtime   Wed Jan  3 16:20:42 2007
        ctime   Wed Jan  3 16:20:42 2007
        crtime  Wed Jan  3 15:26:53 2007
[ ... ]

And, of course, this can be seen in the source code:

     87 /*
     88  * This is the persistent portion of the znode.  It is stored
     89  * in the "bonus buffer" of the file.  Short symbolic links
     90  * are also stored in the bonus buffer.
     91  */
     92 typedef struct znode_phys {
     93  uint64_t zp_atime[2];  /*  0 - last file access time */
     94  uint64_t zp_mtime[2];  /* 16 - last file modification time */
     95  uint64_t zp_ctime[2];  /* 32 - last file change time */
     96  uint64_t zp_crtime[2];  /* 48 - creation time */

# posted by Chad Mynhier @ 12:17 PM 1 comments

To timidly go where many have gone before