Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of uwe@thetaphi.de designates
 188.138.97.18 as permitted sender)
From: "Uwe Schindler" <uwe@thetaphi.de>
To: <java-user@lucene.apache.org>
References: 
 <CAPzrt-2SUP5g24GiV3ekqDG8yeb9B-KpqkB42Gsykn2XKY+P7Q@mail.gmail.com>
	<04119417CE2C6E43B71CAFC4AD4E22BB7C1EE13A@HYDSVWIN-EXAR1.ivycomptech.partygaming.local>
 <CAPzrt-37Ea59BXFX5Gz730bO3NaVSnVOhOpez_LCEL2nmYfmOQ@mail.gmail.com>
In-Reply-To: 
 <CAPzrt-37Ea59BXFX5Gz730bO3NaVSnVOhOpez_LCEL2nmYfmOQ@mail.gmail.com>
Subject: RE: Separating the document dataset and the index dataset
Date: Tue, 11 Dec 2012 10:44:29 +0100
Message-ID: <004f01cdd784$1d798310$586c8930$@thetaphi.de>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQLdb0CMjpYBCLR4mpdD24j5ya+4ugFeiT5pAh2/WCeV2EZ2wA==
Content-Language: de

You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it is =
not yet released, but upgrading from Lucene 4.0 is easy. If you are not =
yet on Lucene 4.0, there is more work to do, in that case a solution to =
your problem would be to save the stored fields in a separate =
database/whatever and only add *one* stored field to your index, =
containing the document ID inside this external database.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com]
> Sent: Tuesday, December 11, 2012 10:32 AM
> To: java-user@lucene.apache.org
> Subject: Re: Separating the document dataset and the index dataset
>=20
> On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul <jainr@ivycomptech.com> =
wrote:
>=20
> > If you are using lucene 4.0 and afford to compress your document
> > dataset while indexing, it will be a huge savings in terms of disk
> > space and also in IO (resulting in indexing throughput).
> >
> > In our case, it has helped us a lot as compressed data size was
> > roughly 3 times less than  of original document data set size.
> >
> > You may want to check  the below  link.
> >
> >
> > =
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-f
> > ields-with-lucene
> >
> > Regards,
> > Rahul
> >
>=20
> Thank you Rahul. That indeed seems promising. Just one doubt, how do I
> plug this  CompressingStoredFieldsFormat into my app, as in I tried =
bundling
> it in a codec, but not sure if I am proceeding in the right path. Any =
pointers
> would be of great help!
>=20
> >
> >
> > -----Original Message-----
> > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com]
> > Sent: 07 December 2012 13:03
> > To: java-user@lucene.apache.org
> > Subject: Separating the document dataset and the index dataset
> >
> > Greetings,
> >
> >          We are using lucene in our log analysis tool. We get data
> > around 35Gb a day and we have this practice of zipping week old
> > indices and then unzip when need arises.
> >
> >            Though the compression offers a huge saving with respect =
to
> > disk space, the decompression becomes an overhead. At times it takes
> > around
> > 10 minutes (de-compression takes 95% of the time) to search across a
> > month long set of logs. We need to unzip fully atleast to get the
> > total count from the index.
> >
> >            My question is, we are setting Index.Store to true. Is
> > there a way where we can split the index dataset and the document
> > dataset. In my understanding, if at all separation is possible, the
> > document dataset can alone be zipped leaving the index dataset on
> > disk? Will it be tangible to do this? Any pointers?
> >
> >            Or is adding more disks the only solution? Thanks in =
advance!
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > +91 9626975420
> > This email and any attachments are confidential, and may be legally
> > privileged and protected by copyright. If you are not the intended
> > recipient dissemination or copying of this email is prohibited. If =
you
> > have received this in error, please notify the sender by replying by
> > email and then delete the email completely from your system. Any =
views
> > or opinions are solely those of the sender. This communication is =
not
> > intended to form a binding contract unless expressly indicated to =
the
> > contrary and properly authorised. Any actions taken on the basis of
> > this email are at the recipient's own risk.
> >
> > =
---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>=20
>=20
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Engineer Trainee,
> Zoho Corporation.
> +91 9626975420


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org