lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jain Rahul <>
Subject RE: Separating the document dataset and the index dataset
Date Fri, 07 Dec 2012 07:41:56 GMT
If you are using lucene 4.0 and afford to compress your document dataset while indexing, it
will be a huge savings in terms of disk space and also in IO (resulting in indexing throughput).

In our case, it has helped us a lot as compressed data size was roughly 3 times less than
 of original document data set size.

You may want to check  the below  link.


-----Original Message-----
From: Ramprakash Ramamoorthy []
Sent: 07 December 2012 13:03
Subject: Separating the document dataset and the index dataset


         We are using lucene in our log analysis tool. We get data around 35Gb a day and we
have this practice of zipping week old indices and then unzip when need arises.

           Though the compression offers a huge saving with respect to disk space, the decompression
becomes an overhead. At times it takes around 10 minutes (de-compression takes 95% of the
time) to search across a month long set of logs. We need to unzip fully atleast to get the
total count from the index.

           My question is, we are setting Index.Store to true. Is there a way where we can
split the index dataset and the document dataset. In my understanding, if at all separation
is possible, the document dataset can alone be zipped leaving the index dataset on disk? Will
it be tangible to do this? Any pointers?

           Or is adding more disks the only solution? Thanks in advance!

With Thanks and Regards,
Ramprakash Ramamoorthy,
+91 9626975420
This email and any attachments are confidential, and may be legally privileged and protected
by copyright. If you are not the intended recipient dissemination or copying of this email
is prohibited. If you have received this in error, please notify the sender by replying by
email and then delete the email completely from your system. Any views or opinions are solely
those of the sender. This communication is not intended to form a binding contract unless
expressly indicated to the contrary and properly authorised. Any actions taken on the basis
of this email are at the recipient's own risk.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message