lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject Re: Index Size
Date Wed, 18 Aug 2004 21:01:55 GMT
From: Doug Cutting
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg08757.html

> An index typically requires around 35% of the plain text size.

I think it's a little big.

sv

On Wed, 18 Aug 2004, Rob Jose wrote:

> Hello
> I have indexed several thousand (52 to be exact) text files and I keep 
> running out of disk space to store the indexes.  The size of the 
> documents I have indexed is around 2.5 GB.  The size of the Lucene 
> indexes is around 287 GB.  Does this seem correct?  I am not storing the 
> contents of the file, just indexing and tokenizing.  I am using Lucene 
> 1.3 final.  Can you guys let me know what you are experiencing?  I don't 
> want to go into production with something that I should be configuring 
> better.  
> 
> I am not sure if this helps, but I have a temp index and a real index.  I index the file
into the temp index, and then merge the temp index into the real index using the addIndexes
method on the IndexWriter.  I have also set the production writer setUseCompoundFile to true.
 I did not set this on the temp index.  The last thing that I do before closing the production
writer is to call the optimize method.  
> 
> I would really appreciate any ideas to get the index size smaller if it is at all possible.
> 
> Thanks
> Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message