Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08A7ED193 for ; Tue, 11 Dec 2012 13:51:47 +0000 (UTC) Received: (qmail 8463 invoked by uid 500); 11 Dec 2012 13:51:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 8311 invoked by uid 500); 11 Dec 2012 13:51:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 8291 invoked by uid 99); 11 Dec 2012 13:51:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 13:51:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of youngestachiever@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-ee0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 13:51:39 +0000 Received: by mail-ee0-f48.google.com with SMTP id b57so2479774eek.35 for ; Tue, 11 Dec 2012 05:51:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=nKae2OYkMLdvE5QOS8E6MqgNuqISLkKOmG0ky0xTFhg=; b=NVU6NDEPDDj9sZHIY0zqG266ibn8YCkuXkwRy3Zhpzsa0xWRjtYJaczW5QLXjt2Hxk 0Kw6Ofa1NsfPGHD0Ye2wgJeRIwyCplNwG0yOTc+QzF+8ZfmLPL+6lREFqtJmn0923n8u ICddoPLLhM14tmJ+1iR7cjgthC5bwXvrBNzJWXB9fCTtTkooT0Gr0BAeZHTHXBR5gV+2 yh6sYz/8m8TSZpPP+z33y+a3eafVJfMTjIy6l+/euEYRPerrgtSfIX4SS9NqBxoial4N yKSJ8/6wn43jS8FOvaxNyE8Mo42kSsCeRQ7kdoU9JkcdM3CKVM23eoYlhj0ymA9RAXqK ayHg== MIME-Version: 1.0 Received: by 10.14.203.8 with SMTP id e8mr61110034eeo.2.1355233878167; Tue, 11 Dec 2012 05:51:18 -0800 (PST) Received: by 10.14.2.130 with HTTP; Tue, 11 Dec 2012 05:51:18 -0800 (PST) In-Reply-To: <007801cdd78b$f41cda50$dc568ef0$@pangaea.de> References: <04119417CE2C6E43B71CAFC4AD4E22BB7C1EE13A@HYDSVWIN-EXAR1.ivycomptech.partygaming.local> <004f01cdd784$1d798310$586c8930$@thetaphi.de> <007801cdd78b$f41cda50$dc568ef0$@pangaea.de> Date: Tue, 11 Dec 2012 19:21:18 +0530 Message-ID: Subject: Re: Separating the document dataset and the index dataset From: Ramprakash Ramamoorthy To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b343abae550f604d093f9e1 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b343abae550f604d093f9e1 Content-Type: text/plain; charset=ISO-8859-1 On Tue, Dec 11, 2012 at 4:10 PM, Uwe Schindler wrote: > In Lucene 4.1 the compressing codec is no longer a separate codec, the > main Codec ("Lucene41") compresses by default. Just reindex your data or > use IndexUpgrader. > Thanks Uwe. This one helped. My index size came down from 816 Mb to 198 Mb. Win! > > Uwe > > ----- > UWE SCHINDLER > Webserver/Middleware Development > PANGAEA - Data Publisher for Earth & Environmental Science > MARUM (Cognium building) - University of Bremen > Room 0510, Hochschulring 18, D-28359 Bremen > Tel.: +49 421 218 65595 > Fax: +49 421 218 65505 > http://www.pangaea.de/ > E-mail: uschindler@pangaea.de > > > > -----Original Message----- > > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com] > > Sent: Tuesday, December 11, 2012 11:36 AM > > To: java-user@lucene.apache.org > > Subject: Re: Separating the document dataset and the index dataset > > > > On Tue, Dec 11, 2012 at 3:14 PM, Uwe Schindler wrote: > > > > > You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it is > > > not yet released, but upgrading from Lucene 4.0 is easy. If you are > > > not yet on Lucene 4.0, there is more work to do, in that case a > > > solution to your problem would be to save the stored fields in a > > > separate database/whatever and only add *one* stored field to your > > > index, containing the document ID inside this external database. > > > > > > ----- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: uwe@thetaphi.de > > > > > > Thank you Uwe. Already tried with the nightly build, but the codecs.jar > in it > > isn't having a compressing codec at all, Tried pulling out from the > trunk and > > then compiling, same issue, *org.apache.lucene.codecs.compressing*is > > missing. Any pointers? > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com] > > > > Sent: Tuesday, December 11, 2012 10:32 AM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: Separating the document dataset and the index dataset > > > > > > > > On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul > > > wrote: > > > > > > > > > If you are using lucene 4.0 and afford to compress your document > > > > > dataset while indexing, it will be a huge savings in terms of disk > > > > > space and also in IO (resulting in indexing throughput). > > > > > > > > > > In our case, it has helped us a lot as compressed data size was > > > > > roughly 3 times less than of original document data set size. > > > > > > > > > > You may want to check the below link. > > > > > > > > > > > > > > > http://blog.jpountz.net/post/33247161884/efficient-compressed-stor > > > > > ed-f > > > > > ields-with-lucene > > > > > > > > > > Regards, > > > > > Rahul > > > > > > > > > > > > > Thank you Rahul. That indeed seems promising. Just one doubt, how do > > > > I plug this CompressingStoredFieldsFormat into my app, as in I > > > > tried > > > bundling > > > > it in a codec, but not sure if I am proceeding in the right path. > > > > Any > > > pointers > > > > would be of great help! > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Ramprakash Ramamoorthy > > [mailto:youngestachiever@gmail.com] > > > > > Sent: 07 December 2012 13:03 > > > > > To: java-user@lucene.apache.org > > > > > Subject: Separating the document dataset and the index dataset > > > > > > > > > > Greetings, > > > > > > > > > > We are using lucene in our log analysis tool. We get data > > > > > around 35Gb a day and we have this practice of zipping week old > > > > > indices and then unzip when need arises. > > > > > > > > > > Though the compression offers a huge saving with > > > > > respect to disk space, the decompression becomes an overhead. At > > > > > times it takes around > > > > > 10 minutes (de-compression takes 95% of the time) to search across > > > > > a month long set of logs. We need to unzip fully atleast to get > > > > > the total count from the index. > > > > > > > > > > My question is, we are setting Index.Store to true. Is > > > > > there a way where we can split the index dataset and the document > > > > > dataset. In my understanding, if at all separation is possible, > > > > > the document dataset can alone be zipped leaving the index dataset > > > > > on disk? Will it be tangible to do this? Any pointers? > > > > > > > > > > Or is adding more disks the only solution? Thanks in > > > advance! > > > > > > > > > > -- > > > > > With Thanks and Regards, > > > > > Ramprakash Ramamoorthy, > > > > > +91 9626975420 > > > > > This email and any attachments are confidential, and may be > > > > > legally privileged and protected by copyright. If you are not the > > > > > intended recipient dissemination or copying of this email is > > > > > prohibited. If you have received this in error, please notify the > > > > > sender by replying by email and then delete the email completely > > > > > from your system. Any views or opinions are solely those of the > > > > > sender. This communication is not intended to form a binding > > > > > contract unless expressly indicated to the contrary and properly > > > > > authorised. Any actions taken on the basis of this email are at the > > recipient's own risk. > > > > > > > > > > ------------------------------------------------------------------ > > > > > --- To unsubscribe, e-mail: > > > > > java-user-unsubscribe@lucene.apache.org > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > -- > > > > With Thanks and Regards, > > > > Ramprakash Ramamoorthy, > > > > Engineer Trainee, > > > > Zoho Corporation. > > > > +91 9626975420 > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > -- > > With Thanks and Regards, > > Ramprakash Ramamoorthy, > > Engineer Trainee, > > Zoho Corporation. > > +91 9626975420 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420 --047d7b343abae550f604d093f9e1--