lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <vfunst...@gmail.com>
Subject Re: Toggling compression for stored fields
Date Wed, 15 May 2013 22:56:51 GMT
Uwe,

I may not be doing this correctly, but I tried to see what would happen if
I were to a reopen an index created with a custom codec that disables
stored fields compression, and it doesn't seem to work. Here's how I
configure the writer to disable compression, prior to indexing:

        final StoredFieldsFormat sfFmt = new Lucene40StoredFieldsFormat();
        idxWriterCfg.setCodec(new
FilterCodec("DisableStoreFieldCompressionCodec", new Lucene41Codec()) {

          @Override
          public StoredFieldsFormat storedFieldsFormat() {
            return sfFmt;
          }

        });
      }

However, when an index that was created with this writer configuration is
opened, I get this exception:

Exception in thread "main" java.lang.IllegalArgumentException: A SPI class
of type org.apache.lucene.codecs.Codec with name
'DisableStoreFieldCompressionCodec' does not exist. You need to add the
corresponding JAR file supporting this SPI to your classpath.The current
classpath supports the following names: [Lucene40, Lucene3x, Lucene41]
    at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:104)
    at org.apache.lucene.codecs.Codec.forName(Codec.java:95)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:299)
    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
    at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
    at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
    at
org.apache.lucene.index.DirectoryReader.indexExists(DirectoryReader.java:322)


I also tried instantiating Lucene40Codec directly to avoid using a named
FilterCodec, but that codec apparently disallows writing to index in Lucene
4.1:

java.lang.UnsupportedOperationException: this codec can only be used for
reading
    at
org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246)
    at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
    at
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336)
    at
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)    at
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
    at
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
    at
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:487)
    at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
    at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
    at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:357)
    at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
    at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245)
    at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
    at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
    at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
    at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
    at
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:154)
    at
org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:233)

What am I doing wrong here?

Thx,
Vitaly

On Wed, May 15, 2013 at 2:47 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Yes. You can also force this by using IW.forceMerge(1), unless your index
> is not already consisting of only one segment. Another alternative is to
> use IndexUpgrader, but this one would only merge if there are segments
> created with an older Lucene version. You can change this by overriding
> IndexUpgrader's merge policy to use all segments.
>
> You reminded me to open an issue to add the possibility to IndexUpgrader
> to also "upgrade" segments using a different codec configuration, not just
> coming from an older Lucene version (which is possible to do).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Vitaly Funstein [mailto:vfunstein@gmail.com]
> > Sent: Wednesday, May 15, 2013 11:36 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Toggling compression for stored fields
> >
> > Thanks for the quick reply, this is certainly good news. So just to
> clarify
> > - doing a manual segment merge is optional when changing codecs,
> correct? I
> > mean, I can just restart my application with a new codec config and let
> the
> > regular, background merging task do the work of eventually converting all
> > the data to the new format?
> >
> > On Wed, May 15, 2013 at 2:30 PM, Uwe Schindler <uwe@thetaphi.de>
> > wrote:
> >
> > > Hi Vitaly,
> > >
> > > what you call an "index" is just a collection (a CompositeReader) of
> > > atomic readers. They can be mixed regarding compression, just like you
> > > could have a MultiReader with different indexes using different codecs.
> > > Every atomic segment of an index can only have one stored fields
> format.
> > > Once merging occurs, the uncompressed fields of e.g. an older atomic
> > > segment gets merged into a new segment with compression enabled. The
> > > same can happen in the other direction. The codec is responsible for
> > > encoding the data on disk and this includes the compression. When
> > > merging segments, the data is uncompressed and recompressed as
> > needed.
> > > To improve performance, there are shortcuts to copy the data directly
> > > if the codec does not change while merging.
> > >
> > > With Lucene 4.x, you are free to open an IndexWriter with a different
> > > codec configuration and e.g. use IndexUpgrader or do a force merge
> > > manually to merge all "old" segments and "recompress" them to a
> > > different codec config. This has nothing to do with "reindexing" as
> > > you are just changing the encoding of the exact same data on disk.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Vitaly Funstein [mailto:vfunstein@gmail.com]
> > > > Sent: Wednesday, May 15, 2013 10:38 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Toggling compression for stored fields
> > > >
> > > > Is it possible to have a mix of compressed and uncompressed
> > > > documents within a single index? That is, can I load an index
> > > > created with Lucene
> > > 4.0 into
> > > > 4.1 and defer the decision of whether or not to use
> > > > CompressingStoredFieldsFormat until a later time, or even go back
> > > > and
> > > forth
> > > > between compressed and uncompressed codecs, if needed? I thought at
> > > > first the answer would be an unequivocal "no", but then how would
> > > > one migrate data from 4.0 to 4.1 without a full reindex?
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message