lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: index bigger than it should be?
Date Thu, 27 Oct 2011 13:27:54 GMT
There's org.apache.lucene.index.CheckIndex which will report assorted
stats about the index, as well as checking it for correctness.  It can
fix it too but you don't need that.  I hope. Will take quite a while
to run on a large index.

What version of lucene?  Does a before/after (or large/small)
directory listing give any clues?


--
Ian.


On Thu, Oct 27, 2011 at 12:44 PM,  <v.sevel@lombardodier.com> wrote:
> Hi,
>
> I have an application that has an index with 30 millions docs in it. every
> day, I add around 1 million docs, and I remove the oldest 1 million, to
> keepit stable at 30 million.
> for the most part doc fields are indexed and stored. each doc weighs
> around from a few Kb to a 1 Mb (a few Mb in some cases).
> I used to be able to maintain the index at around 60 Gb on disk. but
> recently the index has had a tendency to keep growing (90 Gb). I can see
> that the expunge is doing what it should do, because after it executes,
> the size on disk does go down, but never as low as the previous day. from
> the outside, it looks like a leak, but since I do not remove the docs I
> added during the day, it might be that the new docs are just bigger than
> the old ones. still I am surprised with the increase.
>
> are there any tools to dig into the index structure and help justify the
> space taken on disk?
> I was thinking about something that would help identify terms that take up
> the most space, or some sort of dump that I could compare from one day to
> the other.
>
> any help appreciated,
>
> thanks,
>
> vince

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message