lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Haxby <>
Subject Re: Performance and FS block size
Date Sun, 12 Feb 2006 21:48:57 GMT
Otis Gospodnetic wrote:

>I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really what I'm after
(finding a better/faster FS).  What I'm wondering is about different block sizes on a single
(ext3) FS.
>If I understand block sizes correctly, they represent a chunk of data that the FS will
read in a single read.
>- If the block size is 1K, and Lucene needs to read 4K of data, then the disk will have
to do 4 reads, and will read in a total of 4K.
>- If the block size is 4K, and Lucene needs to read 3K of data, then the disk will have
to do 1 read, and will read a total of 3K, although that will actually consume 4K, because
that's the size of a block.
That's correct Otis.   Applications generally to get best performance 
when they read data in the file system block size (or small multiples 
thereof) which for ext2 and ext3 is almost always 4k.  It might be 
interesting to try making file systems with different block sizes and 
see what the effect on performance is and also, perhaps trying larger 
block sizes in Lucene, but always keeping Lucene's block size a multiple 
of the file system block size.   For an educated guess, I'd say that 
4k/4k gives better performance than smaller file system block sizes and 
8k/4k is not likely to have much of an effect either way.

>Does any of this sound right?
>I recall Paul Elschot talking about disk reads and disk arm movement, and Robert Engels
talking about Nio and block sizes, so they might know more about this stuff.
It depends very much on the type of disk: 15,000 rpm ultra-scsi 320 
disks on a 64 bit PCI card will probably be faster than a 4200rpm disk 
in a laptop :-)   Seriously, disk configuration makes a lot of 
difference: striped RAID arrays will give the best I/O performance 
(given a  controller and whatnot that can exploit that).   Once you get 
into huge amount of I/O there are other, more complex issues that affect 

java.nio has the right features to exploit the I/O subsystem of the OS 
to good advantage.   We haven't done the performance measurements yet, 
but memory mappied I/O should yield the best performance (as well as 
freeing you from worrying about what block size is best).    It will 
also be interesting to try the different I/O schedulers under Linux: cfq 
is the default for the 2.6 kernel that Red Hat ships, but I can imagine 
the deadline scheduler may give interesting results.   As I say, at some 
stage over the next few months we're likely to be looking at this in 
more detail.

The one thing that makes more difference than anything else though is 
locality of reference; this seems to well understood by the Lucene index 
format and is probably why the performance is generall good!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message