lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: index bigger than it should be?
Date Thu, 27 Oct 2011 13:27:54 GMT
There's org.apache.lucene.index.CheckIndex which will report assorted
stats about the index, as well as checking it for correctness.  It can
fix it too but you don't need that.  I hope. Will take quite a while
to run on a large index.

What version of lucene?  Does a before/after (or large/small)
directory listing give any clues?


On Thu, Oct 27, 2011 at 12:44 PM,  <> wrote:
> Hi,
> I have an application that has an index with 30 millions docs in it. every
> day, I add around 1 million docs, and I remove the oldest 1 million, to
> keepit stable at 30 million.
> for the most part doc fields are indexed and stored. each doc weighs
> around from a few Kb to a 1 Mb (a few Mb in some cases).
> I used to be able to maintain the index at around 60 Gb on disk. but
> recently the index has had a tendency to keep growing (90 Gb). I can see
> that the expunge is doing what it should do, because after it executes,
> the size on disk does go down, but never as low as the previous day. from
> the outside, it looks like a leak, but since I do not remove the docs I
> added during the day, it might be that the new docs are just bigger than
> the old ones. still I am surprised with the increase.
> are there any tools to dig into the index structure and help justify the
> space taken on disk?
> I was thinking about something that would help identify terms that take up
> the most space, or some sort of dump that I could compare from one day to
> the other.
> any help appreciated,
> thanks,
> vince

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message