lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramprakash Ramamoorthy <>
Subject Separating the document dataset and the index dataset
Date Fri, 07 Dec 2012 07:32:37 GMT

         We are using lucene in our log analysis tool. We get data around
35Gb a day and we have this practice of zipping week old indices and then
unzip when need arises.

           Though the compression offers a huge saving with respect to disk
space, the decompression becomes an overhead. At times it takes around 10
minutes (de-compression takes 95% of the time) to search across a month
long set of logs. We need to unzip fully atleast to get the total count
from the index.

           My question is, we are setting Index.Store to true. Is there a
way where we can split the index dataset and the document dataset. In my
understanding, if at all separation is possible, the document dataset can
alone be zipped leaving the index dataset on disk? Will it be tangible to
do this? Any pointers?

           Or is adding more disks the only solution? Thanks in advance!

With Thanks and Regards,
Ramprakash Ramamoorthy,
+91 9626975420

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message