lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Seeing what's occupying all the space in the index
Date Fri, 26 May 2006 20:27:26 GMT
It kind of sounds like those files are corrupted, but I can't say for 
sure.  When you look in Luke at your index (the one with all the files, 
not the new one) do you see all the documents you would expect to see 
with values that seem reasonable?  Also, in Luke, you can see a listing 
of all the files it thinks are in the index, do they match with what you 
see via a file listing on the command line?

Also, you may want to see if you have any stale locks or the like that 
is preventing you from doing an optimize.

Rob Staveley (Tom) wrote:
> Indexing 55648 documents in a new clean directory, I see only .cfs files (+
> deletable  + segments). Disk usage is 65K for all of these, which means that
> each message takes ~1K of index space rather than > 10K as it does in my
> 99GB index.
> Bearing in mind that the large index has > 5 million Lucene documents
> indexed in it now, do you reckon I can merge the .fdt, .prx and .frq into a
> compound index?
> -----Original Message-----
> From: Grant Ingersoll []
> Sent: 26 May 2006 18:38
> To:
> Subject: Re: Seeing what's occupying all the space in the index
>> Can you try a smaller sample in a clean directory and see what size it is
> (so that it doesn't take as long to index)?


Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 
Voice:  315-443-5484 
Fax: 315-443-6886 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message