lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: Toggling compression for stored fields
Date Wed, 15 May 2013 23:15:51 GMT
You are right, this was only possible in early versions...

You have to write a non anonymous public subclass of FilterCodec and list it in your META-INF/services
folder.



Vitaly Funstein <vfunstein@gmail.com> schrieb:

>Yes, I thought about inlining an anonymous subclass of Lucene41Codec
>but
>unfortunately all of its methods are final, which effectively rules out
>this approach. I think I may have to do the latter, since I am
>obviously in
>control of internal JAR packaging anyway...
>
>On Wed, May 15, 2013 at 4:06 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> You don't change the Codec at all just the stored fields
>implementation,
>> so you dont need to give it a new name. The simpliest is to anonymous
>> subclass Lucene41Codec without FilterCodec.
>>
>> If your codec gets a new name, this name must be regustered in the
>codec
>> manager by adding META-INF files to your JAR and not using anonymous
>> subclasses.
>>
>>
>>
>> Vitaly Funstein <vfunstein@gmail.com> schrieb:
>>
>> >Uwe,
>> >
>> >I may not be doing this correctly, but I tried to see what would
>happen
>> >if
>> >I were to a reopen an index created with a custom codec that
>disables
>> >stored fields compression, and it doesn't seem to work. Here's how I
>> >configure the writer to disable compression, prior to indexing:
>> >
>> >     final StoredFieldsFormat sfFmt = new
>Lucene40StoredFieldsFormat();
>> >        idxWriterCfg.setCodec(new
>> >FilterCodec("DisableStoreFieldCompressionCodec", new
>Lucene41Codec()) {
>> >
>> >          @Override
>> >          public StoredFieldsFormat storedFieldsFormat() {
>> >            return sfFmt;
>> >          }
>> >
>> >        });
>> >      }
>> >
>> >However, when an index that was created with this writer
>configuration
>> >is
>> >opened, I get this exception:
>> >
>> >Exception in thread "main" java.lang.IllegalArgumentException: A SPI
>> >class
>> >of type org.apache.lucene.codecs.Codec with name
>> >'DisableStoreFieldCompressionCodec' does not exist. You need to add
>the
>> >corresponding JAR file supporting this SPI to your classpath.The
>> >current
>> >classpath supports the following names: [Lucene40, Lucene3x,
>Lucene41]
>> >at
>>
>>org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:104)
>> >    at org.apache.lucene.codecs.Codec.forName(Codec.java:95)
>> >    at
>org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:299)
>> >at
>org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
>> >    at
>>
>>
>>org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
>> >    at
>>
>>
>>org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
>> >    at
>org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
>> >    at
>>
>>
>>org.apache.lucene.index.DirectoryReader.indexExists(DirectoryReader.java:322)
>> >
>> >
>> >I also tried instantiating Lucene40Codec directly to avoid using a
>> >named
>> >FilterCodec, but that codec apparently disallows writing to index in
>> >Lucene
>> >4.1:
>> >
>> >java.lang.UnsupportedOperationException: this codec can only be used
>> >for
>> >reading
>> >    at
>>
>>
>>org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246)
>> >    at
>>
>>
>>org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>> >    at
>>
>>
>>org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336)
>> >    at
>>
>>
>>org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>> >   at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)   
>at
>> >org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>> >    at
>>
>>org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
>> >    at
>>
>>
>>org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:487)
>> >    at
>>
>>org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>> >    at
>>
>>
>>org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>> > at
>org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:357)
>> >    at
>>
>>
>>org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
>> >    at
>>
>>
>>org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245)
>> >    at
>>
>>
>>org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
>> >    at
>>
>>
>>org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
>> >    at
>>
>>
>>org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
>> >    at
>>
>>
>>org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
>> >    at
>>
>>
>>org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:154)
>> >    at
>>
>>
>>org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:233)
>> >
>> >What am I doing wrong here?
>> >
>> >Thx,
>> >Vitaly
>> >
>> >On Wed, May 15, 2013 at 2:47 PM, Uwe Schindler <uwe@thetaphi.de>
>wrote:
>> >
>> >> Yes. You can also force this by using IW.forceMerge(1), unless
>your
>> >index
>> >> is not already consisting of only one segment. Another alternative
>is
>> >to
>> >> use IndexUpgrader, but this one would only merge if there are
>> >segments
>> >> created with an older Lucene version. You can change this by
>> >overriding
>> >> IndexUpgrader's merge policy to use all segments.
>> >>
>> >> You reminded me to open an issue to add the possibility to
>> >IndexUpgrader
>> >> to also "upgrade" segments using a different codec configuration,
>not
>> >just
>> >> coming from an older Lucene version (which is possible to do).
>> >>
>> >> Uwe
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: uwe@thetaphi.de
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Vitaly Funstein [mailto:vfunstein@gmail.com]
>> >> > Sent: Wednesday, May 15, 2013 11:36 PM
>> >> > To: java-user@lucene.apache.org
>> >> > Subject: Re: Toggling compression for stored fields
>> >> >
>> >> > Thanks for the quick reply, this is certainly good news. So just
>to
>> >> clarify
>> >> > - doing a manual segment merge is optional when changing codecs,
>> >> correct? I
>> >> > mean, I can just restart my application with a new codec config
>and
>> >let
>> >> the
>> >> > regular, background merging task do the work of eventually
>> >converting all
>> >> > the data to the new format?
>> >> >
>> >> > On Wed, May 15, 2013 at 2:30 PM, Uwe Schindler <uwe@thetaphi.de>
>> >> > wrote:
>> >> >
>> >> > > Hi Vitaly,
>> >> > >
>> >> > > what you call an "index" is just a collection (a
>CompositeReader)
>> >of
>> >> > > atomic readers. They can be mixed regarding compression, just
>> >like you
>> >> > > could have a MultiReader with different indexes using
>different
>> >codecs.
>> >> > > Every atomic segment of an index can only have one stored
>fields
>> >> format.
>> >> > > Once merging occurs, the uncompressed fields of e.g. an older
>> >atomic
>> >> > > segment gets merged into a new segment with compression
>enabled.
>> >The
>> >> > > same can happen in the other direction. The codec is
>responsible
>> >for
>> >> > > encoding the data on disk and this includes the compression.
>When
>> >> > > merging segments, the data is uncompressed and recompressed as
>> >> > needed.
>> >> > > To improve performance, there are shortcuts to copy the data
>> >directly
>> >> > > if the codec does not change while merging.
>> >> > >
>> >> > > With Lucene 4.x, you are free to open an IndexWriter with a
>> >different
>> >> > > codec configuration and e.g. use IndexUpgrader or do a force
>> >merge
>> >> > > manually to merge all "old" segments and "recompress" them to
>a
>> >> > > different codec config. This has nothing to do with
>"reindexing"
>> >as
>> >> > > you are just changing the encoding of the exact same data on
>> >disk.
>> >> > >
>> >> > > Uwe
>> >> > >
>> >> > > -----
>> >> > > Uwe Schindler
>> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> > > http://www.thetaphi.de
>> >> > > eMail: uwe@thetaphi.de
>> >> > >
>> >> > >
>> >> > > > -----Original Message-----
>> >> > > > From: Vitaly Funstein [mailto:vfunstein@gmail.com]
>> >> > > > Sent: Wednesday, May 15, 2013 10:38 PM
>> >> > > > To: java-user@lucene.apache.org
>> >> > > > Subject: Toggling compression for stored fields
>> >> > > >
>> >> > > > Is it possible to have a mix of compressed and uncompressed
>> >> > > > documents within a single index? That is, can I load an
>index
>> >> > > > created with Lucene
>> >> > > 4.0 into
>> >> > > > 4.1 and defer the decision of whether or not to use
>> >> > > > CompressingStoredFieldsFormat until a later time, or even
go
>> >back
>> >> > > > and
>> >> > > forth
>> >> > > > between compressed and uncompressed codecs, if needed? I
>> >thought at
>> >> > > > first the answer would be an unequivocal "no", but then how
>> >would
>> >> > > > one migrate data from 4.0 to 4.1 without a full reindex?
>> >> > >
>> >> > >
>> >> > >
>>
>>---------------------------------------------------------------------
>> >> > > To unsubscribe, e-mail:
>java-user-unsubscribe@lucene.apache.org
>> >> > > For additional commands, e-mail:
>java-user-help@lucene.apache.org
>> >> > >
>> >> > >
>> >>
>> >>
>> >>
>---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>> --
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, 28213 Bremen
>> http://www.thetaphi.de

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message