Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75EC7F19F for ; Sun, 14 Apr 2013 23:13:49 +0000 (UTC) Received: (qmail 21252 invoked by uid 500); 14 Apr 2013 23:13:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 21196 invoked by uid 500); 14 Apr 2013 23:13:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 21188 invoked by uid 99); 14 Apr 2013 23:13:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Apr 2013 23:13:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of welshwang@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Apr 2013 23:13:40 +0000 Received: by mail-vc0-f169.google.com with SMTP id hx10so3502160vcb.28 for ; Sun, 14 Apr 2013 16:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=SzDwZeC0xb/UhpnXPnImK6To6wr4MIyJz91yomRDtYY=; b=kdePwdA02M7LfL9zfumqYPj/qoRw1rHq5y8MCHxXT7Ffht4mEvZ+C+NRROb1srq4L3 /x4cn0uvZa+xdWCrATtW6l8ar4nahvVRbAh5s/Pe1SlrTZNSkeMlP7k7wmGJr4CQZdbL akv69mUC8dV2SVg+X6Z+tQBDpOT08xdJWtXt0d9eXZWSOVw2cwezwizxMh3oN18jznmy h/c+sSQ6L28vKJJHS1S9QKATxXMxBQTI75asE+cBONMBlx0eKdNF/PG3M/xakTBjvJyD DW6MOXU3RuIv3z08Ajv4rwxJnM31b+GYU0xS8FOKLlqWbcRwg/iBG/9dFVbSxuHbD0qe aMSA== MIME-Version: 1.0 X-Received: by 10.220.156.75 with SMTP id v11mr14550871vcw.1.1365981199463; Sun, 14 Apr 2013 16:13:19 -0700 (PDT) Received: by 10.52.28.71 with HTTP; Sun, 14 Apr 2013 16:13:19 -0700 (PDT) In-Reply-To: <017501ce395f$2315ca30$69415e90$@thetaphi.de> References: <016101ce3958$c157dff0$44079fd0$@thetaphi.de> <016b01ce395b$df225760$9d670620$@thetaphi.de> <017501ce395f$2315ca30$69415e90$@thetaphi.de> Date: Sun, 14 Apr 2013 16:13:19 -0700 Message-ID: Subject: Re: DiskDocValuesFormat From: Wei Wang To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=f46d043c7c8629fd7004da5a4808 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c7c8629fd7004da5a4808 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable That makes sense. BTW, I checked the jar file. Exactly as you pointed out, the services files only contains info from lucene-core, without codec from lucene-codecs. After adding the maven plugin, now it is running. Thanks! On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler wrote: > Hi, > > > Thanks for the hint. I will double check the jar file. > > > > I am just a bit puzzled that if the indexing step recognizes 'Disk' > codec and > > creates index properly, the merge step that immediately follows indexin= g > > seems should also recognize the 'Disk' codec. > > This is easy to explain: By creating the custom Lucene42 Codec as a Class= , > you just define the disk format on the initial write (when *new* segments > are written with new documents). While merging (or force-merging), Lucene > uses the metadata that=92s already on disk for the segments to merge. The > metadata on disk contains the names of all codec components used. Those > metadata is also used when opening IndexReaders. It will then use SPI and > META-INF/services files to look up the class that is responsible for e.g. > the "Disk" docvalues format. Without the META-INF data, Lucene cannot > lookup the segment codecs. > > Uwe > > > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler wrote: > > > > > Are you sure that you use the ServicesResourceTransformer in your > > > shade config? > > > > > > > > > http://maven.apache.org/plugins/maven-shade- > > plugin/examples/resource-t > > > ransformers.html#ServicesResourceTransformer > > > > > > The problem is: lucene-core.jar and lucene-codecs.jar both contain > > > codec components and their classes are listed in META-INF/services. I= f > > > those files are not correctly merged through this resource > > > transformer, the resulting JAR file will miss some codecs. > > > > > > You can check correctness by opening the final JAR file with a ZIP > > > program and check that all files in META-INF/services contain all > > > entries merged from all Lucene JARs. > > > > > > Uwe > > > > > > ----- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: uwe@thetaphi.de > > > > > > > > > > -----Original Message----- > > > > From: Wei Wang [mailto:welshwang@gmail.com] > > > > Sent: Sunday, April 14, 2013 11:49 PM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: DiskDocValuesFormat > > > > > > > > Yes, I used Maven Shade plugin, but still have this problem. Here i= s > > > > the Maven output during packaging: > > > > > > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @ > > > > audience-profile- indexer --- [INFO] Including > > > > commons-collections:commons- > > > > collections:jar:3.2.1 in the shaded jar. > > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded > jar. > > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded > jar. > > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar. > > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar. > > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar. > > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the > > > > shaded > > > jar. > > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the > > > > shaded jar. > > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in > > > > the shaded jar. > > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the > > > > shaded jar. > > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shade= d > jar. > > > > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.= 1 > > > > in the shaded jar. > > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the > > > > shaded > > > jar. > > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded ja= r. > > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the > > > > shaded jar. > > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar. > > > > [INFO] Replacing original artifact with shaded artifact. > > > > > > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler > > wrote: > > > > > > > > > If you create a single JAR file out of multiple Lucene JAR files > > > > > use a tool like Maven Shade plugin, otherwise, required metadata > > > > > propreties > > > > > (META-INF/services) files in the JAR files are not correctly > > > > > merged together. > > > > > > > > > > ----- > > > > > Uwe Schindler > > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > > > eMail: uwe@thetaphi.de > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Wei Wang [mailto:welshwang@gmail.com] > > > > > > Sent: Sunday, April 14, 2013 11:30 PM > > > > > > To: java-user@lucene.apache.org > > > > > > Subject: Re: DiskDocValuesFormat > > > > > > > > > > > > Hi Adrien, > > > > > > > > > > > > The Lucene42Codec works well to generate the index with > > > > > > DiskDocValuesFormat. But when I tried to merge the index segmen= ts > > by > > > > > > calling: > > > > > > > > > > > > IndexWriter iw =3D new IndexWriter(directory, iw_config); ... > > > > > > iw.forceMerge(1); > > > > > > > > > > > > I got the following error message: > > > > > > > > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of > type > > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does > > not > > > > exist. > > > > > > You need to add the corresponding JAR file supporting this SPI = to > > > > > > your classpath.The current classpath supports the following > names: > > > > > > [Lucene42] > > > > > > > > > > > > Any hint on this classpath problem? I have created a single jar > file > > > > > that has all > > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I > > > > > > assume the indexing step works well, so Lucene already knows th= e > > > > > > format with name 'Disk'. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand > > > > > wrote: > > > > > > > > > > > > > Hi Wei, > > > > > > > > > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang > > > > > > > > wrote: > > > > > > > > I am trying to use DiskDocValuesFormat for a particular > > > > > > > > BinaryDocValuesField. It seems there is no good examples > > showing > > > > > > > > how to > > > > > > > do > > > > > > > > this. The only hint I got from various docs and forums is s= et > > > > > > > > some codec > > > > > > > in > > > > > > > > IndexWriter. Could someone give a few lines of code snippet > and > > > > > > > > show how > > > > > > > to > > > > > > > > set DiskDocValuesFormat? > > > > > > > > > > > > > > Lucene42Codec can be extended to specify the doc values forma= t > > to > > > > > > > use on a per-field basis. For example: > > > > > > > > > > > > > > final Codec codec =3D new Lucene42Codec() { > > > > > > > final Lucene42DocValuesFormat memoryDVFormat =3D new > > > > > > > Lucene42DocValuesFormat(); > > > > > > > final DiskDocValuesFormat diskDVFormat =3D new > > > > DiskDocValuesFormat(); > > > > > > > @Override > > > > > > > public DocValuesFormat getDocValuesFormatForField(String > field) > > { > > > > > > > if ("dv_mem".equals(field)) { > > > > > > > // use Lucene42 for "dv_mem" > > > > > > > return memoryDVFormat; > > > > > > > } else { > > > > > > > // use Disk otherwise > > > > > > > return diskDVFormat; > > > > > > > } > > > > > > > } > > > > > > > }; > > > > > > > > > > > > > > Then just pass this Codec instance to your IndexWriterConfig. > > > > > > > > > > > > > > -- > > > > > > > Adrien > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > > > --- To unsubscribe, e-mail: > > > > > > > java-user-unsubscribe@lucene.apache.org > > > > > > > For additional commands, e-mail: java-user- > > help@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --f46d043c7c8629fd7004da5a4808--