lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Performance and FS block size
Date Sun, 12 Feb 2006 20:23:38 GMT
Hi,

I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really what I'm after (finding
a better/faster FS).  What I'm wondering is about different block sizes on a single (ext3)
FS.
If I understand block sizes correctly, they represent a chunk of data that the FS will read
in a single read.
- If the block size is 1K, and Lucene needs to read 4K of data, then the disk will have to
do 4 reads, and will read in a total of 4K.
- If the block size is 4K, and Lucene needs to read 3K of data, then the disk will have to
do 1 read, and will read a total of 3K, although that will actually consume 4K, because that's
the size of a block.

If the above is correct, then I think the Lucene performance will depend on the block size,
types of searches, and the order of data on disk.
For instance, if queries are completely random, require small reads (e.g. small post lists),
and hit data that is scattered around the index/disk, then a smaller block size will not hurt
as much.
On the other hand, if queries are not random (i.e. they hit the same part of index), or if
the data on disk is sorted in a chronological order and queries use chronological order sorting,
and read larger chunks of data (e.g. if you have large pieces of text stored and you read
them off of disk), then larger blocks will work better, because they will require the disk
to perform fewer reads.

Does any of this sound right?
I recall Paul Elschot talking about disk reads and disk arm movement, and Robert Engels talking
about Nio and block sizes, so they might know more about this stuff.

Thanks,
Otis



----- Original Message ----
From: Byron Miller <byronmhome@yahoo.com>
To: java-user@lucene.apache.org
Sent: Fri 10 Feb 2006 10:02:35 PM EST
Subject: Re: Performance and FS block size

Otis,

If i'm not mistaken block size especially on ext3
becomes an issue when you hit a peak amount of total
blocks and lose performance on inode lookup vs that of
of Reiserfs.. for example you may gain performance by
going to 4k vs 1k on ext3 however Reiserfs at that
block level size should be xx times faster in many
scenerios..

HOWEVER that only considers if your data is fitting in
that block size. If you have hundreds of thousands of
1-4k files Reiserfs at 1k block size would be best
(least wastefull and faster access because of it's
b-tree lookup) but if your dealing with lots of large
files there won't be much difference unless you switch
altogether to XFS which has fairly aggressive caching
and performance in mind. (it simply doesn't wait and
keeps on trucking, heavy utilization of memory to
buffer throughput)

What does your hdparm speeds look like?

eg:

booger@svr1 [/home/mozdex/segments]# dumpe2fs
/dev/sdb1 | grep "Block size"
dumpe2fs 1.32 (09-Nov-2002)
Block size:               4096
booger@svr1 [/home/mozdex/segments]# hdparm -tT
/dev/sdb1

/dev/sdb1:
 Timing buffer-cache reads:   3372 MB in  2.00 seconds
= 1685.89 MB/sec
 Timing buffered disk reads:  110 MB in  3.00 seconds
=  36.62 MB/sec
booger@svr1 [/home/mozdex/segments]#

My server is under load at these test however they
came out pretty good considering :)

--- Otis Gospodnetic <otis_gospodnetic@yahoo.com>
wrote:

> Hi,
> 
> Thanks for the speedy answer, this is good to know.
> However, i was wondering about the FS block size....
> consider a Linux box:
> 
> $ dumpe2fs  /dev/sda1 | grep "Block size"
> dumpe2fs 1.36 (05-Feb-2005)
> Block size:               1024
> 
> That shows /dev/sda1 has blocks 1k in size.  I don't
> think these can be changed "on-the-fly", and can be
> changed only by re-creating the FS (e.g. mkfs.ext3
> .... under Linux).  Thus, I can't test different
> block sizes easily, and am wondering if anyone has
> already done this, or simply knows what block size,
> theoretically at least, should perform better.
> 
> Thanks,
> Otis
> 
> ----- Original Message ----
> From: Michael D. Curtin <mike@curtin.com>
> To: java-user@lucene.apache.org
> Sent: Fri 10 Feb 2006 05:05:07 PM EST
> Subject: Re: Performance and FS block size
> 
> Otis Gospodnetic wrote:
> 
> > Hi,
> > 
> > I'm wondering if anyone has tested Lucene
> indexing/search performance with different file
> system block sizes?
> > 
> > I just realized one of the servers where I run a
> lot of Lucene indexing and searching has an FS with
> blocks of only 1K in size (typically they are 4k or
> 8k, I believe), so I started wondering what's better
> for Lucene - smaller or larger blocks?  I have a
> feeling 1K is too small, although I don't know
> enough to back up this feeling. :(
> 
> On my system (dual Xeon with a couple 120GB S-ATA
> drives (not RAIDed), running 
> Fedora Core 3) I changed BUFFER_SIZE in
> storage/OutputStream.java to 4096, 
> achieving about 30% better performance in indexing. 
> The search improvement 
> was smaller, enough smaller that it was on order
> what I thought my measurement 
> error was.  I tried values up to 64K, but there
> wasn't much change on my 
> system after 4K.
> 
> --MDC
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message