lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: Index Size
Date Thu, 19 Aug 2004 08:09:01 GMT
Rob,

as Doug and Paul already mentioned, the index size is definitely to big :-(.

What could raise the problem, especially when running on a windows 
platform, is that an IndexReader is open during the whole index process. 
During indexing, the writer creates temporary segment files which will 
be merged into bigger segments. If done, the old segment files will be 
deleted. If there is an open IndexReader, the environment is unable to 
unlock the files and they still stay in the index directory. You will 
end up with an index, several times bigger than the dataset.

Can you check your code for any open IndexReaders when indexing, or 
paste the relevant part to the list so we could have a look on it.

hope this helps
Bernhard


Rob Jose wrote:

>Hello
>I have indexed several thousand (52 to be exact) text files and I keep running out of
disk space to store the indexes.  The size of the documents I have indexed is around 2.5 GB.
 The size of the Lucene indexes is around 287 GB.  Does this seem correct?  I am not storing
the contents of the file, just indexing and tokenizing.  I am using Lucene 1.3 final.  Can
you guys let me know what you are experiencing?  I don't want to go into production with something
that I should be configuring better.  
>
>I am not sure if this helps, but I have a temp index and a real index.  I index the file
into the temp index, and then merge the temp index into the real index using the addIndexes
method on the IndexWriter.  I have also set the production writer setUseCompoundFile to true.
 I did not set this on the temp index.  The last thing that I do before closing the production
writer is to call the optimize method.  
>
>I would really appreciate any ideas to get the index size smaller if it is at all possible.
>
>Thanks
>Rob
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message