lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: DiskDocValuesFormat
Date Sun, 14 Apr 2013 22:26:40 GMT
Hi,

> Thanks for the hint. I will double check the jar file.
> 
> I am just a bit puzzled that if the indexing step recognizes 'Disk' codec and
> creates index properly, the merge step that immediately follows indexing
> seems should also recognize the 'Disk' codec.

This is easy to explain: By creating the custom Lucene42 Codec as a Class, you just define
the disk format on the initial write (when *new* segments are written with new documents).
While merging (or force-merging), Lucene uses the metadata that’s already on disk for the
segments to merge. The metadata on disk contains the names of all codec components used. Those
metadata is also used when opening IndexReaders. It will then use SPI and META-INF/services
files to look up the class that is responsible for e.g. the "Disk" docvalues format. Without
the META-INF data, Lucene cannot lookup the segment codecs.

Uwe

> On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> > Are you sure that you use the ServicesResourceTransformer in your
> > shade config?
> >
> >
> > http://maven.apache.org/plugins/maven-shade-
> plugin/examples/resource-t
> > ransformers.html#ServicesResourceTransformer
> >
> > The problem is: lucene-core.jar and lucene-codecs.jar both contain
> > codec components and their classes are listed in META-INF/services. If
> > those files are not correctly merged through this resource
> > transformer, the resulting JAR file will miss some codecs.
> >
> > You can check correctness by opening the final JAR file with a ZIP
> > program and check that all files in META-INF/services contain all
> > entries merged from all Lucene JARs.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > Sent: Sunday, April 14, 2013 11:49 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: DiskDocValuesFormat
> > >
> > > Yes, I used Maven Shade plugin, but still have this problem. Here is
> > > the Maven output during packaging:
> > >
> > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
> > > audience-profile- indexer --- [INFO] Including
> > > commons-collections:commons-
> > > collections:jar:3.2.1 in the shaded jar.
> > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded jar.
> > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded jar.
> > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
> > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
> > > shaded
> > jar.
> > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the
> > > shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in
> > > the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the
> > > shaded jar.
> > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1
> > > in the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the
> > > shaded
> > jar.
> > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded jar.
> > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the
> > > shaded jar.
> > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
> > > [INFO] Replacing original artifact with shaded artifact.
> > >
> > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uwe@thetaphi.de>
> wrote:
> > >
> > > > If you create a single JAR file out of multiple Lucene JAR files
> > > > use a tool like Maven Shade plugin, otherwise, required metadata
> > > > propreties
> > > > (META-INF/services) files in the JAR files are not correctly
> > > > merged together.
> > > >
> > > > -----
> > > > Uwe Schindler
> > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > eMail: uwe@thetaphi.de
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > > > Sent: Sunday, April 14, 2013 11:30 PM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: Re: DiskDocValuesFormat
> > > > >
> > > > > Hi Adrien,
> > > > >
> > > > > The Lucene42Codec works well to generate the index with
> > > > > DiskDocValuesFormat. But when I tried to merge the index segments
> by
> > > > > calling:
> > > > >
> > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > > > > iw.forceMerge(1);
> > > > >
> > > > > I got the following error message:
> > > > >
> > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of type
> > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does
> not
> > > exist.
> > > > > You need to add the corresponding JAR file supporting this SPI to
> > > > > your classpath.The current classpath supports the following names:
> > > > > [Lucene42]
> > > > >
> > > > > Any hint on this classpath problem? I have created a single jar file
> > > > that has all
> > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
> > > > > assume the indexing step works well, so Lucene already knows the
> > > > > format with name 'Disk'.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jpountz@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Wei,
> > > > > >
> > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
> <welshwang@gmail.com>
> > > > > wrote:
> > > > > > > I am trying to use DiskDocValuesFormat for a particular
> > > > > > > BinaryDocValuesField. It seems there is no good examples
> showing
> > > > > > > how to
> > > > > > do
> > > > > > > this. The only hint I got from various docs and forums
is set
> > > > > > > some codec
> > > > > > in
> > > > > > > IndexWriter. Could someone give a few lines of code snippet
and
> > > > > > > show how
> > > > > > to
> > > > > > > set DiskDocValuesFormat?
> > > > > >
> > > > > > Lucene42Codec can be extended to specify the doc values format
> to
> > > > > > use on a per-field basis. For example:
> > > > > >
> > > > > > final Codec codec = new Lucene42Codec() {
> > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > > > > > Lucene42DocValuesFormat();
> > > > > >   final DiskDocValuesFormat diskDVFormat = new
> > > DiskDocValuesFormat();
> > > > > >   @Override
> > > > > >   public DocValuesFormat getDocValuesFormatForField(String field)
> {
> > > > > >     if ("dv_mem".equals(field)) {
> > > > > >       // use Lucene42 for "dv_mem"
> > > > > >       return memoryDVFormat;
> > > > > >     } else {
> > > > > >       // use Disk otherwise
> > > > > >       return diskDVFormat;
> > > > > >     }
> > > > > >   }
> > > > > > };
> > > > > >
> > > > > > Then just pass this Codec instance to your IndexWriterConfig.
> > > > > >
> > > > > > --
> > > > > > Adrien
> > > > > >
> > > > > > ------------------------------------------------------------------
> > > > > > --- To unsubscribe, e-mail:
> > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail: java-user-
> help@lucene.apache.org
> > > > > >
> > > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message