lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Help to determine why an optimized index is proportionaly too big.
Date Thu, 02 Apr 2009 14:47:07 GMT
On Wed, Apr 1, 2009 at 5:20 PM, Dan OConnor <doconnor@acquiremedia.com> wrote:
> All:
>
> We are using java lucene 2.3.2 to index a fairly large number of documents (roughly 400,000
per day). We have divided the time history into various depths.
>
> Our first stage covers 8 days and our next stage covers 22. The index directory for the
first stage is approximately 20G when fully optimized. The index directory of our second stage
is over 250GB when optimized. Our third stage (which is 60 days) is only ~80GB when optimized.

Are these divisions overlapping?  Ie, 2nd index includes all docs from
the first one (8 days) plus another 14 days, and 3rd index (60 days)
includes all 22 days from 2nd one plus another 38 days?

If so, I don't understand eg why the 60 day index is smaller than the
22 day index.  That's odd.

> The second stage index failed an optimization with a disk full exception (I had to move
it to another lucene machine with a larger disk partition to complete the optimization. Is
there a reason why a 22 day index would be 10x the size of an 8 day index when the document
indexing rate is fairly constant? Also, is there a way to shrink the index without regenerating
it?

You can't shrink the index without deleting docs... eg you could
delete docs, and add newer docs that store less stuff, to shrink the
index over time (merging/optimizing will reclaim disk space).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message