lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan OConnor <docon...@acquiremedia.com>
Subject Help to determine why an optimized index is proportionaly too big.
Date Wed, 01 Apr 2009 21:20:26 GMT
All:

We are using java lucene 2.3.2 to index a fairly large number of documents (roughly 400,000
per day). We have divided the time history into various depths.

Our first stage covers 8 days and our next stage covers 22. The index directory for the first
stage is approximately 20G when fully optimized. The index directory of our second stage is
over 250GB when optimized. Our third stage (which is 60 days) is only ~80GB when optimized.

The second stage index failed an optimization with a disk full exception (I had to move it
to another lucene machine with a larger disk partition to complete the optimization. Is there
a reason why a 22 day index would be 10x the size of an 8 day index when the document indexing
rate is fairly constant? Also, is there a way to shrink the index without regenerating it?

Any help/pointers would be greatly appreciated.

Thanks and Regards,
Dan

Dan O'Connor
SVP, Engineering
Acquire Media<http://www.acquiremedia.com/>
e: doconnor@acquiremedia.com<mailto:doconnor@acquiremedia.com>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message