Friday, May 19, 2006
ZFS benchmarking
ZFS was made publically available on November 16, 2005. I was doing my usual scan of Sun blogs and saw some dozens of different entries announcing this. I'd been waiting a year or so to get my hands on ZFS, so I feverishly set about downloading the appropriate bits so that I could install a server and start playing with it. (I'm being very literal when I say "feverishly" -- I wasn't feeling that well that day and measured myself at 102 degrees or thereabouts (39 degrees for anyone who wonders why I was above the boiling point of water) when I got home that evening.)
Over the next few weeks, I ran some benchmarks comparing ZFS to UFS, ext3fs and reiserfs. I avoided the standard benchmarks and used a script I'd developed earlier during the year to compare some cheap NAS implementations. This script was originally intended simply to generate a large amount in a filesystem hierarchy that mirrored what we would be doing with that cheap NAS, where there was a companion script to be run across a couple dozen servers to generate read traffic. The company I work for would probably balk if I put that script here, but it essentially created a filesystem hierarchy that looked like [00-FF]/[00-FF]/[0-F]/[1-64], where the 64 files at the leaves are ~10k.
I started out by trying to determine the parameters I wanted to use when running the benchmark against different filesystems, as I wanted to get 100% disk utilization. Unfortunately, I used ZFS to determine these parameters. This turned out to be serious overkill for the other filesystems, as the numbers below indicate. Here are the parameters I used, with an explanation for the values:
The server I was using was a 2 x 2.8GHz Dell 1850 with a single 73GB SCSI disk and 2GB RAM. I ran the tests using both UFS and ZFS under Solaris x86 and both ext3fs and reiserfs under Linux. To avoid differences in performance between the inside and the outside of the disk, I used the same cylinders on the disk for all tests (plus or minus a cylinder or two.) The times include syncing the data to disk.
Here are the results of these runs (averaged over several runs each.) The "starting empty" times represent runs with a newly-created filesystem. The "consecutive run" times represent runs when I don't clean up after a "starting empty" run, i.e., the files are being rewritten into the existing filesystem structure.
So ZFS is the fastest all around for this particular workload by a spectacular margin. (And it's probably interesting to note that ext3fs was the only filesystem that was actually faster on "consecutive" runs. Given the asynchronous metadata updates, it might not be that surprising.) But, as I stated earlier, this was a workload designed to keep the disk busy when ZFS is being used. So while I'd demonstrated that ZFS can handle a heavy workload better than the other filesystems, I hadn't demonstrated that it's faster under a resonable workload. So I re-calibrated to keep all the filesystems below 100% disk utilization, which ended up being these parameters:
So ZFS still won, but the margin of victory wasn't quite as large as with the first test. And here we see reiserfs doing better on consecutive runs, too. But while the above is informative, it still doesn't show the full story. It's also interesting to note the disk utilization during these runs. (Note that this wasn't a rigorous measurement, just eyeballing iostat output during the tests.)
Over the next few weeks, I ran some benchmarks comparing ZFS to UFS, ext3fs and reiserfs. I avoided the standard benchmarks and used a script I'd developed earlier during the year to compare some cheap NAS implementations. This script was originally intended simply to generate a large amount in a filesystem hierarchy that mirrored what we would be doing with that cheap NAS, where there was a companion script to be run across a couple dozen servers to generate read traffic. The company I work for would probably balk if I put that script here, but it essentially created a filesystem hierarchy that looked like [00-FF]/[00-FF]/[0-F]/[1-64], where the 64 files at the leaves are ~10k.
I started out by trying to determine the parameters I wanted to use when running the benchmark against different filesystems, as I wanted to get 100% disk utilization. Unfortunately, I used ZFS to determine these parameters. This turned out to be serious overkill for the other filesystems, as the numbers below indicate. Here are the parameters I used, with an explanation for the values:
write-data -c 5 -u 1200 -m 64I'm running 5 concurrent processes, each creating 1200 leaf directories with 64 files each. So 5 * 1200 * 64 * 10k is about 3.7GB of data in all (plus metadata.)
The server I was using was a 2 x 2.8GHz Dell 1850 with a single 73GB SCSI disk and 2GB RAM. I ran the tests using both UFS and ZFS under Solaris x86 and both ext3fs and reiserfs under Linux. To avoid differences in performance between the inside and the outside of the disk, I used the same cylinders on the disk for all tests (plus or minus a cylinder or two.) The times include syncing the data to disk.
Here are the results of these runs (averaged over several runs each.) The "starting empty" times represent runs with a newly-created filesystem. The "consecutive run" times represent runs when I don't clean up after a "starting empty" run, i.e., the files are being rewritten into the existing filesystem structure.
Filesystem: | UFS | ZFS | ext3fs | reiserfs |
Time (min:sec)(starting empty) | 28:16 | 2:49 | 60:39 | 46:25 |
Time (min:sec)(consecutive run) | 59:57 | 5:34 | 20:26 | 50:31 |
So ZFS is the fastest all around for this particular workload by a spectacular margin. (And it's probably interesting to note that ext3fs was the only filesystem that was actually faster on "consecutive" runs. Given the asynchronous metadata updates, it might not be that surprising.) But, as I stated earlier, this was a workload designed to keep the disk busy when ZFS is being used. So while I'd demonstrated that ZFS can handle a heavy workload better than the other filesystems, I hadn't demonstrated that it's faster under a resonable workload. So I re-calibrated to keep all the filesystems below 100% disk utilization, which ended up being these parameters:
write-data -c 1 -u 1200 -m 64So instead of 5 concurrent processes, there's just 1. And here are the results:
Filesystem: | UFS | ZFS | ext3fs | reiserfs |
Time (min:sec)(starting empty) | 3:24 | 0:35 | 7:28 | 4:43 |
Time (min:sec)(consecutive run) | 11:01 | 0:38 | 1:10 | 2:34 |
So ZFS still won, but the margin of victory wasn't quite as large as with the first test. And here we see reiserfs doing better on consecutive runs, too. But while the above is informative, it still doesn't show the full story. It's also interesting to note the disk utilization during these runs. (Note that this wasn't a rigorous measurement, just eyeballing iostat output during the tests.)
Filesystem: | UFS | ZFS | ext3fs | reiserfs |
% Utilization (starting empty) | 95-100 | 45-50 | 95-100 | 95-100 |
% Utilization (consecutive run) | 95-100 | 51-56 | 95-100 | 95-100 |
Comments:
<< Home
Wow, someone's actually reading my ramblings! :-)
Thanks for the link. I'll be posting a couple more benchmarks soon. Well, one of them is really just a demonstration of the I/O reordering ZFS does to give reads preference. But it's pretty impressive.
Chad
Thanks for the link. I'll be posting a couple more benchmarks soon. Well, one of them is really just a demonstration of the I/O reordering ZFS does to give reads preference. But it's pretty impressive.
Chad
Super color scheme, I like it! Keep up the good work. Thanks for sharing this wonderful site with us.
»
»
Thanks for these benchmarks. I am already really fascinated of zfs, but this does the rest.
I will use your benchmarks on my website and keep on doing some Howtos to zfs.
My website is in german, but this is my "at-work" homepage: http://www.users.sbg.ac.at/~widhalmt
I will use your benchmarks on my website and keep on doing some Howtos to zfs.
My website is in german, but this is my "at-work" homepage: http://www.users.sbg.ac.at/~widhalmt
Chad - you get a blogger high 5 - great content, perfect tone & presentation.
If only I could blog like this - sniff , sniff...
If only I could blog like this - sniff , sniff...
Thanks for the benchmark info.
Filesystem benchmarking is not an easy task. For example some filesystems may R/W faster but utilize the CPU/RAM higher. That`s why (I think) it`s important to do so called "realworld" benchmarks. I imagine the typical scenario like database server which is doing very complex and heavy queries thus utilizing both the memory and CPU. Of course the perf. may vary depending on the RDBMS used / it`s internal storage engine.
I would try something like imdb database for example....
Filesystem benchmarking is not an easy task. For example some filesystems may R/W faster but utilize the CPU/RAM higher. That`s why (I think) it`s important to do so called "realworld" benchmarks. I imagine the typical scenario like database server which is doing very complex and heavy queries thus utilizing both the memory and CPU. Of course the perf. may vary depending on the RDBMS used / it`s internal storage engine.
I would try something like imdb database for example....
Well, I think you miss a lot of things in this benchmark...
What kind of journalling have you used in ext3, for example.
Also, where is your 'script' to be validated? Why not use the actual, well know, filesystem bench tools, like IOZone, who gives many kind of results?
Cya,
Rodrigo (BSDaemon).
Rodrigo Rubira Branco
What kind of journalling have you used in ext3, for example.
Also, where is your 'script' to be validated? Why not use the actual, well know, filesystem bench tools, like IOZone, who gives many kind of results?
Cya,
Rodrigo (BSDaemon).
Rodrigo Rubira Branco
Good stuff Chad.
Could you please remove the post by Anonymous : 9:11 PM
That reads ...
I find some information here.
And links to ADVERT "content" at the URL below ...
http://10oal.info/3379
And do you really want those refinancing links?
Post a Comment
Could you please remove the post by Anonymous : 9:11 PM
That reads ...
I find some information here.
And links to ADVERT "content" at the URL below ...
http://10oal.info/3379
And do you really want those refinancing links?
<< Home