Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2758D1F9 for ; Tue, 11 Dec 2012 09:44:59 +0000 (UTC) Received: (qmail 34819 invoked by uid 500); 11 Dec 2012 09:44:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 34725 invoked by uid 500); 11 Dec 2012 09:44:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 34702 invoked by uid 99); 11 Dec 2012 09:44:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 09:44:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 09:44:50 +0000 Received: from VEGA (unknown [134.102.251.51]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id 7BCCD14AA069 for ; Tue, 11 Dec 2012 09:44:28 +0000 (UTC) From: "Uwe Schindler" To: References: <04119417CE2C6E43B71CAFC4AD4E22BB7C1EE13A@HYDSVWIN-EXAR1.ivycomptech.partygaming.local> In-Reply-To: Subject: RE: Separating the document dataset and the index dataset Date: Tue, 11 Dec 2012 10:44:29 +0100 Message-ID: <004f01cdd784$1d798310$586c8930$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQLdb0CMjpYBCLR4mpdD24j5ya+4ugFeiT5pAh2/WCeV2EZ2wA== Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it is = not yet released, but upgrading from Lucene 4.0 is easy. If you are not = yet on Lucene 4.0, there is more work to do, in that case a solution to = your problem would be to save the stored fields in a separate = database/whatever and only add *one* stored field to your index, = containing the document ID inside this external database. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com] > Sent: Tuesday, December 11, 2012 10:32 AM > To: java-user@lucene.apache.org > Subject: Re: Separating the document dataset and the index dataset >=20 > On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul = wrote: >=20 > > If you are using lucene 4.0 and afford to compress your document > > dataset while indexing, it will be a huge savings in terms of disk > > space and also in IO (resulting in indexing throughput). > > > > In our case, it has helped us a lot as compressed data size was > > roughly 3 times less than of original document data set size. > > > > You may want to check the below link. > > > > > > = http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-f > > ields-with-lucene > > > > Regards, > > Rahul > > >=20 > Thank you Rahul. That indeed seems promising. Just one doubt, how do I > plug this CompressingStoredFieldsFormat into my app, as in I tried = bundling > it in a codec, but not sure if I am proceeding in the right path. Any = pointers > would be of great help! >=20 > > > > > > -----Original Message----- > > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com] > > Sent: 07 December 2012 13:03 > > To: java-user@lucene.apache.org > > Subject: Separating the document dataset and the index dataset > > > > Greetings, > > > > We are using lucene in our log analysis tool. We get data > > around 35Gb a day and we have this practice of zipping week old > > indices and then unzip when need arises. > > > > Though the compression offers a huge saving with respect = to > > disk space, the decompression becomes an overhead. At times it takes > > around > > 10 minutes (de-compression takes 95% of the time) to search across a > > month long set of logs. We need to unzip fully atleast to get the > > total count from the index. > > > > My question is, we are setting Index.Store to true. Is > > there a way where we can split the index dataset and the document > > dataset. In my understanding, if at all separation is possible, the > > document dataset can alone be zipped leaving the index dataset on > > disk? Will it be tangible to do this? Any pointers? > > > > Or is adding more disks the only solution? Thanks in = advance! > > > > -- > > With Thanks and Regards, > > Ramprakash Ramamoorthy, > > +91 9626975420 > > This email and any attachments are confidential, and may be legally > > privileged and protected by copyright. If you are not the intended > > recipient dissemination or copying of this email is prohibited. If = you > > have received this in error, please notify the sender by replying by > > email and then delete the email completely from your system. Any = views > > or opinions are solely those of the sender. This communication is = not > > intended to form a binding contract unless expressly indicated to = the > > contrary and properly authorised. Any actions taken on the basis of > > this email are at the recipient's own risk. > > > > = --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > >=20 >=20 > -- > With Thanks and Regards, > Ramprakash Ramamoorthy, > Engineer Trainee, > Zoho Corporation. > +91 9626975420 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org