Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: "Uwe Schindler" <uschindler@pangaea.de>
To: <java-user@lucene.apache.org>
References: 
 <CAPzrt-2SUP5g24GiV3ekqDG8yeb9B-KpqkB42Gsykn2XKY+P7Q@mail.gmail.com>
	<04119417CE2C6E43B71CAFC4AD4E22BB7C1EE13A@HYDSVWIN-EXAR1.ivycomptech.partygaming.local>
	<CAPzrt-37Ea59BXFX5Gz730bO3NaVSnVOhOpez_LCEL2nmYfmOQ@mail.gmail.com>
	<004f01cdd784$1d798310$586c8930$@thetaphi.de>
 <CAPzrt-1M3midTRYMBpnyiR+3Cm=z4rOfNeWt7VoWA7sXWGeC7g@mail.gmail.com>
In-Reply-To: 
 <CAPzrt-1M3midTRYMBpnyiR+3Cm=z4rOfNeWt7VoWA7sXWGeC7g@mail.gmail.com>
Subject: RE: Separating the document dataset and the index dataset
Date: Tue, 11 Dec 2012 11:40:35 +0100
Message-ID: <007801cdd78b$f41cda50$dc568ef0$@pangaea.de>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQLdb0CMjpYBCLR4mpdD24j5ya+4ugFeiT5pAh2/WCcCi6D2UAK5mzP1la4s/sA=
Content-Language: de

In Lucene 4.1 the compressing codec is no longer a separate codec, the =
main Codec ("Lucene41") compresses by default. Just reindex your data or =
use IndexUpgrader.

Uwe

-----
UWE SCHINDLER
Webserver/Middleware Development
PANGAEA - Data Publisher for Earth & Environmental Science
MARUM (Cognium building) - University of Bremen
Room 0510, Hochschulring 18, D-28359 Bremen
Tel.: +49 421 218 65595
Fax:  +49 421 218 65505
http://www.pangaea.de/
E-mail: uschindler@pangaea.de


> -----Original Message-----
> From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com]
> Sent: Tuesday, December 11, 2012 11:36 AM
> To: java-user@lucene.apache.org
> Subject: Re: Separating the document dataset and the index dataset
>=20
> On Tue, Dec 11, 2012 at 3:14 PM, Uwe Schindler <uwe@thetaphi.de> =
wrote:
>=20
> > You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it =
is
> > not yet released, but upgrading from Lucene 4.0 is easy. If you are
> > not yet on Lucene 4.0, there is more work to do, in that case a
> > solution to your problem would be to save the stored fields in a
> > separate database/whatever and only add *one* stored field to your
> > index, containing the document ID inside this external database.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
>=20
>=20
> Thank you Uwe. Already tried with the nightly build, but the =
codecs.jar in it
> isn't having a compressing codec at all, Tried pulling out from the =
trunk and
> then compiling, same issue, *org.apache.lucene.codecs.compressing*is
> missing. Any pointers?
>=20
> >
> >
> >
> > > -----Original Message-----
> > > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com]
> > > Sent: Tuesday, December 11, 2012 10:32 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Separating the document dataset and the index dataset
> > >
> > > On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul <jainr@ivycomptech.com>
> > wrote:
> > >
> > > > If you are using lucene 4.0 and afford to compress your document
> > > > dataset while indexing, it will be a huge savings in terms of =
disk
> > > > space and also in IO (resulting in indexing throughput).
> > > >
> > > > In our case, it has helped us a lot as compressed data size was
> > > > roughly 3 times less than  of original document data set size.
> > > >
> > > > You may want to check  the below  link.
> > > >
> > > >
> > > > =
http://blog.jpountz.net/post/33247161884/efficient-compressed-stor
> > > > ed-f
> > > > ields-with-lucene
> > > >
> > > > Regards,
> > > > Rahul
> > > >
> > >
> > > Thank you Rahul. That indeed seems promising. Just one doubt, how =
do
> > > I plug this  CompressingStoredFieldsFormat into my app, as in I
> > > tried
> > bundling
> > > it in a codec, but not sure if I am proceeding in the right path.
> > > Any
> > pointers
> > > would be of great help!
> > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Ramprakash Ramamoorthy
> [mailto:youngestachiever@gmail.com]
> > > > Sent: 07 December 2012 13:03
> > > > To: java-user@lucene.apache.org
> > > > Subject: Separating the document dataset and the index dataset
> > > >
> > > > Greetings,
> > > >
> > > >          We are using lucene in our log analysis tool. We get =
data
> > > > around 35Gb a day and we have this practice of zipping week old
> > > > indices and then unzip when need arises.
> > > >
> > > >            Though the compression offers a huge saving with
> > > > respect to disk space, the decompression becomes an overhead. At
> > > > times it takes around
> > > > 10 minutes (de-compression takes 95% of the time) to search =
across
> > > > a month long set of logs. We need to unzip fully atleast to get
> > > > the total count from the index.
> > > >
> > > >            My question is, we are setting Index.Store to true. =
Is
> > > > there a way where we can split the index dataset and the =
document
> > > > dataset. In my understanding, if at all separation is possible,
> > > > the document dataset can alone be zipped leaving the index =
dataset
> > > > on disk? Will it be tangible to do this? Any pointers?
> > > >
> > > >            Or is adding more disks the only solution? Thanks in
> > advance!
> > > >
> > > > --
> > > > With Thanks and Regards,
> > > > Ramprakash Ramamoorthy,
> > > > +91 9626975420
> > > > This email and any attachments are confidential, and may be
> > > > legally privileged and protected by copyright. If you are not =
the
> > > > intended recipient dissemination or copying of this email is
> > > > prohibited. If you have received this in error, please notify =
the
> > > > sender by replying by email and then delete the email completely
> > > > from your system. Any views or opinions are solely those of the
> > > > sender. This communication is not intended to form a binding
> > > > contract unless expressly indicated to the contrary and properly
> > > > authorised. Any actions taken on the basis of this email are at =
the
> recipient's own risk.
> > > >
> > > > =
------------------------------------------------------------------
> > > > --- To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: =
java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > With Thanks and Regards,
> > > Ramprakash Ramamoorthy,
> > > Engineer Trainee,
> > > Zoho Corporation.
> > > +91 9626975420
> >
> >
> > =
---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>=20
>=20
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Engineer Trainee,
> Zoho Corporation.
> +91 9626975420


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org