lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiwi clive <kiwi_cl...@yahoo.com>
Subject Re: Lucene 3.6.0 Index Size
Date Fri, 26 Oct 2012 20:27:40 GMT
Hi Vitaly,

Your hunch is correct, yes there are unmerged segments leftover. However to get indexing throughput,
we use multiple threads on the writer flushing to disk periodically, but the writer can stay
open for some time (until the last thread terminates). However, after an optimize, the index
is closed. Thanks for the advice, I need to revisit the merging section of the application.

Clive




________________________________
 From: Vitaly Funstein <vfunstein@gmail.com>
To: java-user@lucene.apache.org 
Sent: Friday, October 26, 2012 8:13 PM
Subject: Re: Lucene 3.6.0 Index Size
 
One thing to keep in mind is that the default merge policy has changed in
3.6 from 2.3.2 (I'm almost certain of that). So it's just a hunch but you
may have some unmerged segments left over at the end. Try calling
IndexWriter.close(true) after you're done indexing.

On Fri, Oct 26, 2012 at 10:50 AM, kiwi clive <kiwi_clive@yahoo.com> wrote:

> Hello.
>
> We have an index that when creted using lucene2.3.2, has a size of about
> 4G.
>
> Creating the same index (with the same parameters) with lucene 3.6.0
> results in an 11G index.
>
> Could someone shed some light into why the index is so much larger, given
> the same data and the same parameters?
>
> I realize this is a large version jump but a doubling in index size does
> not seem a step in the right direction to me ;-)
>
> I am using cfs format.
>
> Thanks,
> Clive
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message