lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Compressing field content with Lucene 3.0
Date Tue, 29 Dec 2009 10:57:04 GMT
It is still open to you how you handle it. On my projects I normally only
store string fields. If I compress them, they are binary. So I use a similar
approach like you to compress only large fields and store the others as
string fields like before.

When I retrieve the contents of the documents I use
Document.getField("name") and then use the Fieldable.isBinary() to detect if
it was stored binary (and therefore compressed) or if it is string.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Ivan Vasilev [mailto:ivasilev@sirma.bg]
> Sent: Tuesday, December 29, 2009 11:50 AM
> To: java-user@lucene.apache.org
> Subject: Re: Compressing field content with Lucene 3.0
> 
> 10x Uwe for your answer,
> 
> It is good news that data compressed with Field.Store.COMPRESS with 2.4
> will be retrieved properly from 3.0.
> 
>  From your answer I understand that in 3.0 there is no API way to
> compress some of the values of some field and not to compress other
> values for the same field. We should choose either to compress all of
> the values of that field (and keep them as bin values) or not to
> compress any of values for that field?
> 
> Cheers,
> Ivan
> 
> 
> Uwe Schindler wrote:
> > If it is a 2.4 index, you can read it without any problems. It is only
> no
> > longer possible to add fields with Field.Store.COMPRESS. Nothing more
> > changed.
> >
> > If you want to add field with some compression, you have to compress
> > yourself e.g. to a byte[]. You can then add this byte[] as a binary
> stored
> > field. How you deal with this is in your responsibility.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Ivan Vasilev [mailto:ivasilev@sirma.bg]
> >> Sent: Monday, December 28, 2009 7:13 PM
> >> To: LUCENE MAIL LIST
> >> Subject: Compressing field content with Lucene 3.0
> >>
> >> Hi Guys,
> >>
> >> Could you give me advice how to deal with Lucene 3.0 with 2.4 indexes
> >> that contain compressed data.
> >>
> >> Our case is following - we have code like this:
> >>
> >> Field.Store fieldStored = storedFieldsSet.contains(fieldName) ?
> >> (fieldValue.length() >= COMPRESS_THRESHOLD ? Field.Store.COMPRESS :
> >> Field.Store.YES) : Field.Store.NO;
> >> Field.Index fieldIndexed = indexedFieldsSet.contains(fieldName) ?
> >> Field.Index.ANALYZED : Field.Index.NO;
> >> doc.add(new Field(fieldName, fieldValue, fieldStored, fieldIndexed));
> >>
> >> So for one and the same field some values are compressed in the indexes
> >> other values are not.
> >> Reading Lucene documentation I understand that all the values for some
> >> field should be either compressed or not compressed OR we should
> somehow
> >> keep which values for which fields are compressed. If this is so it
> >> would be very difficult to migrate our system to Lucene 3.0 without
> need
> >> of re indexing.
> >> If I am not right could you tell me please:
> >> 1. How to read these indexes (created with the code above with Lucene
> >> 2.4) with Lucene 3.0? I mean when fetching field values how to know
> when
> >> to use CompressionTools.decompressString(..) and when to skip it?
> >> 2. Is there possibility in Lucene 3.0 again to compress values of some
> >> docs for certain field and for other docs the values for the same field
> >> not to be compressed?
> >>
> >> Cheers,
> >> Ivan
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > __________ NOD32 3990 (20090406) Information __________
> >
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
> >
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message