Friday, May 19, 2006

 

ZFS benchmarking

ZFS was made publically available on November 16, 2005. I was doing my usual scan of Sun blogs and saw some dozens of different entries announcing this. I'd been waiting a year or so to get my hands on ZFS, so I feverishly set about downloading the appropriate bits so that I could install a server and start playing with it. (I'm being very literal when I say "feverishly" -- I wasn't feeling that well that day and measured myself at 102 degrees or thereabouts (39 degrees for anyone who wonders why I was above the boiling point of water) when I got home that evening.)

Over the next few weeks, I ran some benchmarks comparing ZFS to UFS, ext3fs and reiserfs. I avoided the standard benchmarks and used a script I'd developed earlier during the year to compare some cheap NAS implementations. This script was originally intended simply to generate a large amount in a filesystem hierarchy that mirrored what we would be doing with that cheap NAS, where there was a companion script to be run across a couple dozen servers to generate read traffic. The company I work for would probably balk if I put that script here, but it essentially created a filesystem hierarchy that looked like [00-FF]/[00-FF]/[0-F]/[1-64], where the 64 files at the leaves are ~10k.

I started out by trying to determine the parameters I wanted to use when running the benchmark against different filesystems, as I wanted to get 100% disk utilization. Unfortunately, I used ZFS to determine these parameters. This turned out to be serious overkill for the other filesystems, as the numbers below indicate. Here are the parameters I used, with an explanation for the values:
write-data -c 5 -u 1200 -m 64
I'm running 5 concurrent processes, each creating 1200 leaf directories with 64 files each. So 5 * 1200 * 64 * 10k is about 3.7GB of data in all (plus metadata.)

The server I was using was a 2 x 2.8GHz Dell 1850 with a single 73GB SCSI disk and 2GB RAM. I ran the tests using both UFS and ZFS under Solaris x86 and both ext3fs and reiserfs under Linux. To avoid differences in performance between the inside and the outside of the disk, I used the same cylinders on the disk for all tests (plus or minus a cylinder or two.) The times include syncing the data to disk.

Here are the results of these runs (averaged over several runs each.) The "starting empty" times represent runs with a newly-created filesystem. The "consecutive run" times represent runs when I don't clean up after a "starting empty" run, i.e., the files are being rewritten into the existing filesystem structure.

Filesystem: UFS ZFS ext3fs reiserfs
Time (min:sec)(starting empty) 28:16 2:49 60:39 46:25
Time (min:sec)(consecutive run) 59:57 5:34 20:26 50:31


So ZFS is the fastest all around for this particular workload by a spectacular margin. (And it's probably interesting to note that ext3fs was the only filesystem that was actually faster on "consecutive" runs. Given the asynchronous metadata updates, it might not be that surprising.) But, as I stated earlier, this was a workload designed to keep the disk busy when ZFS is being used. So while I'd demonstrated that ZFS can handle a heavy workload better than the other filesystems, I hadn't demonstrated that it's faster under a resonable workload. So I re-calibrated to keep all the filesystems below 100% disk utilization, which ended up being these parameters:
write-data -c 1 -u 1200 -m 64
So instead of 5 concurrent processes, there's just 1. And here are the results:

Filesystem: UFS ZFS ext3fs reiserfs
Time (min:sec)(starting empty)3:24 0:357:284:43
Time (min:sec)(consecutive run) 11:01 0:381:102:34


So ZFS still won, but the margin of victory wasn't quite as large as with the first test. And here we see reiserfs doing better on consecutive runs, too. But while the above is informative, it still doesn't show the full story. It's also interesting to note the disk utilization during these runs. (Note that this wasn't a rigorous measurement, just eyeballing iostat output during the tests.)

Filesystem: UFS ZFS ext3fs reiserfs
% Utilization (starting empty) 95-10045-5095-10095-100
% Utilization (consecutive run)95-10051-5695-10095-100



Comments:
Chad, good stuff. I posted a link to you on StorageMojo. Thanks.

Robin
 
Wow, someone's actually reading my ramblings! :-)

Thanks for the link. I'll be posting a couple more benchmarks soon. Well, one of them is really just a demonstration of the I/O reordering ZFS does to give reads preference. But it's pretty impressive.

Chad
 
Greets to the webmaster of this wonderful site. Keep working. Thank you.
»
 
Super color scheme, I like it! Keep up the good work. Thanks for sharing this wonderful site with us.
»
 
Really amazing! Useful information. All the best.
»
 
This comment has been removed by a blog administrator.
 
Thanks for these benchmarks. I am already really fascinated of zfs, but this does the rest.
I will use your benchmarks on my website and keep on doing some Howtos to zfs.
My website is in german, but this is my "at-work" homepage: http://www.users.sbg.ac.at/~widhalmt
 
Thanks, for this, it was exactly what I was looking for.
 
Chad - you get a blogger high 5 - great content, perfect tone & presentation.

If only I could blog like this - sniff , sniff...
 
Thanks for the benchmark info.
Filesystem benchmarking is not an easy task. For example some filesystems may R/W faster but utilize the CPU/RAM higher. That`s why (I think) it`s important to do so called "realworld" benchmarks. I imagine the typical scenario like database server which is doing very complex and heavy queries thus utilizing both the memory and CPU. Of course the perf. may vary depending on the RDBMS used / it`s internal storage engine.
I would try something like imdb database for example....
 
This comment has been removed by a blog administrator.
 
Well, I think you miss a lot of things in this benchmark...

What kind of journalling have you used in ext3, for example.

Also, where is your 'script' to be validated? Why not use the actual, well know, filesystem bench tools, like IOZone, who gives many kind of results?


Cya,


Rodrigo (BSDaemon).
Rodrigo Rubira Branco
 
waterboarding
 
Good stuff Chad.

Could you please remove the post by Anonymous : 9:11 PM

That reads ...

I find some information here.

And links to ADVERT "content" at the URL below ...

http://10oal.info/3379

And do you really want those refinancing links?
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?