lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <nicolas.lale...@anyware-tech.com>
Subject Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields
Date Wed, 16 Aug 2006 12:32:13 GMT
Le Lundi 14 Août 2006 20:44, Michael McCandless a écrit :
> >> If you make the compression external this is already done. In order
> >> to do what the poster requires, you still need to read and update
> >> fields without reading the entire document. You just do this at a
> >> binary field level, and do all of he compression/decompression
> >> externally.
> >>
> >> I think putting the compression into Lucene needlessly complicates
> >> matters. All that is required is in place field updating, and binary
> >> field support.
> >
> > I agree with you.
> > The API should be kept compatible between versions, but what about
> > breaking the compatibility in trunk? Is this will ba a problem is the
> > function Fieldable.isCompressed() is removed ?
>
> OK I think this makes total sense.  I've opened an issue to track this:
>
>    http://issues.apache.org/jira/browse/LUCENE-652

Hi,

In the issue, you wrote that "This way the indexing level just stores opaque 
binary fields, and then Document handles compress/uncompressing as needed."

I have looked into the Lucene code, and it seems to me that it is Field that 
should take care of compress/uncompress, and it is the FieldsReader and 
FieldsWriter that should only view binary data.
Or you mean that compression should be completely external to Lucene ?

In fact, from the end of the other thread "Flexible index format / Payloads 
Cont'd", I was discussing about how to cutomize the way data are stored. So I 
have looked deeper in the code and I think I have found a way to do so. And 
as you could change the way is it stored, you also can define the compression 
level, or handle your own compression algorithm. I will show you a patch, but 
I have modified so much code because of my sevral tries, that I need first to 
remove the unecessary changes. To describe it shortly :
- I have provided a way to provide you own FieldsReader and FieldsWriter (via 
a factory). To create a IndexReader, you have to provide that factory; the 
actual API is just using a default factory.
- I have moved the code of FieldsReader and FieldsReader that do the field 
data reading to a new class FieldData. The FieldsReader instanciates a 
FieldData, do a fielddata.read(input), and do a new Field(fielddata,...). The 
FieldsReader do a field.getFieldData().write(output);
- so extending FieldsReader, you can provide you own implementation of 
FieldData, so you can implement the way you want how data are stored and 
read.
The tests pass successfully, but I have an issue with that design : one thing 
that is important I think is that in the current design, we can read an index 
in an old format, and just do a writer.addIndexes() into a new format. With 
the new design, you cannot, because the writer will use the FieldData.write 
provided by the reader.
To be continued...

cheers,
Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message