lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Appropriate disk optimization for large index?
Date Mon, 18 Aug 2008 21:23:36 GMT

mattspitz wrote:

> Are the index files synced on writer.close()?

No, they aren't.  Not until 2.4 (trunk).

> Thank you so much for your help.  I think the seek time is the issue,
> especially considering the high merge factor and the fact that the  
> segments
> are scattered all over the disk.

You're welcome!  I agree: optimizing seek time seems likely to be the  
biggest win.

> Will a faster disk cache access affect the optimization and  
> merging?  I
> don't really have a sense for what of the segments are kept in  
> memory during
> a merge.  It doesn't make sense to me that Lucene would pull all of  
> the
> segments into memory to merge them, but I don't really know how.

Segments aren't kept in memory during merging... it's more like a  
cursor that sweeps through each of the files for the 50 segments being  
merged.  Lucene does buffer its reads, so we read a chunk into RAM and  
then pull bits off that chunk.  And the OS does readahead.  But  
otherwise it's all on disk and we make a single sweep through each of  
the segments to be merged.

So I wouldn't expect the disk cache's performance to impact Lucene,  
during merging or flushing.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message