lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject index bigger than it should be?
Date Thu, 27 Oct 2011 11:44:29 GMT

I have an application that has an index with 30 millions docs in it. every 
day, I add around 1 million docs, and I remove the oldest 1 million, to 
keepit stable at 30 million.
for the most part doc fields are indexed and stored. each doc weighs 
around from a few Kb to a 1 Mb (a few Mb in some cases).
I used to be able to maintain the index at around 60 Gb on disk. but 
recently the index has had a tendency to keep growing (90 Gb). I can see 
that the expunge is doing what it should do, because after it executes, 
the size on disk does go down, but never as low as the previous day. from 
the outside, it looks like a leak, but since I do not remove the docs I 
added during the day, it might be that the new docs are just bigger than 
the old ones. still I am surprised with the increase.

are there any tools to dig into the index structure and help justify the 
space taken on disk?
I was thinking about something that would help identify terms that take up 
the most space, or some sort of dump that I could compare from one day to 
the other.

any help appreciated,



************************ DISCLAIMER ************************
This message is intended only for use by the person to
whom it is addressed. It may contain information that is
privileged and confidential. Its content does not
constitute a formal commitment by Lombard Odier
Darier Hentsch & Cie or any of its branches or affiliates.
If you are not the intended recipient of this message,
kindly notify the sender immediately and destroy this
message. Thank You.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message