From java-user-return-54342-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Dec 7 07:43:03 2012 Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EAE3E06C for ; Fri, 7 Dec 2012 07:43:03 +0000 (UTC) Received: (qmail 88145 invoked by uid 500); 7 Dec 2012 07:43:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88073 invoked by uid 500); 7 Dec 2012 07:43:00 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88029 invoked by uid 99); 7 Dec 2012 07:42:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 07:42:59 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jainr@ivycomptech.com designates 203.153.210.130 as permitted sender) Received: from [203.153.210.130] (HELO ivymail.ivycomptech.com) (203.153.210.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 07:42:53 +0000 Received: from HYDSVWIN280a.ivycomptech.partygaming.local ([10.1.10.231]) by ivymail.ivycomptech.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 7 Dec 2012 13:12:21 +0530 Received: from HYDSVWIN-EXAR1.ivycomptech.partygaming.local ([::1]) by HYDSVWIN280a.ivycomptech.partygaming.local ([::1]) with mapi id 14.02.0318.004; Fri, 7 Dec 2012 13:12:18 +0530 From: Jain Rahul To: "java-user@lucene.apache.org" Subject: RE: Separating the document dataset and the index dataset Thread-Topic: Separating the document dataset and the index dataset Thread-Index: AQHN1E0bJcYKPRAjMEmLhQt5YomV15gM8hkQ Date: Fri, 7 Dec 2012 07:41:56 +0000 Message-ID: <04119417CE2C6E43B71CAFC4AD4E22BB7C1EE13A@HYDSVWIN-EXAR1.ivycomptech.partygaming.local> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.1.223.17] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 07 Dec 2012 07:42:21.0720 (UTC) FILETIME=[6432C980:01CDD44E] X-Virus-Checked: Checked by ClamAV on apache.org If you are using lucene 4.0 and afford to compress your document dataset wh= ile indexing, it will be a huge savings in terms of disk space and also in = IO (resulting in indexing throughput). In our case, it has helped us a lot as compressed data size was roughly 3 t= imes less than of original document data set size. You may want to check the below link. http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields= -with-lucene Regards, Rahul -----Original Message----- From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com] Sent: 07 December 2012 13:03 To: java-user@lucene.apache.org Subject: Separating the document dataset and the index dataset Greetings, We are using lucene in our log analysis tool. We get data around 3= 5Gb a day and we have this practice of zipping week old indices and then un= zip when need arises. Though the compression offers a huge saving with respect to disk= space, the decompression becomes an overhead. At times it takes around 10 = minutes (de-compression takes 95% of the time) to search across a month lon= g set of logs. We need to unzip fully atleast to get the total count from t= he index. My question is, we are setting Index.Store to true. Is there a w= ay where we can split the index dataset and the document dataset. In my und= erstanding, if at all separation is possible, the document dataset can alon= e be zipped leaving the index dataset on disk? Will it be tangible to do th= is? Any pointers? Or is adding more disks the only solution? Thanks in advance! -- With Thanks and Regards, Ramprakash Ramamoorthy, +91 9626975420 This email and any attachments are confidential, and may be legally privile= ged and protected by copyright. If you are not the intended recipient disse= mination or copying of this email is prohibited. If you have received this = in error, please notify the sender by replying by email and then delete the= email completely from your system. Any views or opinions are solely those = of the sender. This communication is not intended to form a binding contrac= t unless expressly indicated to the contrary and properly authorised. Any a= ctions taken on the basis of this email are at the recipient's own risk. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org