lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koorosh Vakhshoori <kvakhsho...@gmail.com>
Subject TestAllAnalyzersHaveFactories fails when looking for a new Factory class, is it class loader issue?
Date Mon, 16 Nov 2015 23:55:40 GMT
Hi all,
  I am in process of creating a patch for Lucene. However, I can’t get
the JUnit test TestAllAnalyzersHaveFactories pass. Hope this is the
right forum for help. If not kindly direct me to the correct forum.
Any help is greatly appreciated!

  First, some background. The patch is building on Ted Sullivan work,
SOLR-7136. It is an enhanced version of AutoPhrase which I like to
submit to community. The code includes a new TokenFilter,
AutoPhrasingTokenFilter with Junit tests. I have created following
package:

org.apache.lucene.analysis.autophrase

This package contains the following class files:

AutoPhraseDetector.java
AutoPhrasingTokenFilter.java
AutoPhrasingTokenFilterFactory.java
package-info.java

When running the test under ant, the test
TestAllAnalyzersHaveFactories fails with following output, I have
added some print statements for debugging:
============================================================
-test:
   [junit4] <JUnit4> says ????! Master seed: 86F1C35C6CE11696
   [junit4] Your default console's encoding may not display certain
unicode glyphs: US-ASCII
   [junit4] Executing 1 suite with 1 JVM.
   [junit4]
   [junit4] Started J0 PID(15156@localhost).
   [junit4] Suite: org.apache.lucene.analysis.core.TestAllAnalyzersHaveFactories
   [junit4]   1> clazzName: IndicNormalizationFilter
   [junit4]   1> simpleName: IndicNormalization
   [junit4]   1> clazzName: HyphenationCompoundWordTokenFilter
   [junit4]   1> simpleName: HyphenationCompoundWord
   [junit4]   1> clazzName: DictionaryCompoundWordTokenFilter
   [junit4]   1> simpleName: DictionaryCompoundWord
   [junit4]   1> clazzName: BulgarianStemFilter
   [junit4]   1> simpleName: BulgarianStem
   [junit4]   1> clazzName: ShingleFilter
   [junit4]   1> simpleName: Shingle
   [junit4]   1> clazzName: ReverseStringFilter
   [junit4]   1> simpleName: ReverseString
   [junit4]   1> clazzName: GreekLowerCaseFilter
   [junit4]   1> simpleName: GreekLowerCase
   [junit4]   1> clazzName: GreekStemFilter
   [junit4]   1> simpleName: GreekStem
   [junit4]   1> clazzName: HungarianLightStemFilter
   [junit4]   1> simpleName: HungarianLightStem
   [junit4]   1> clazzName: GermanNormalizationFilter
   [junit4]   1> simpleName: GermanNormalization
   [junit4]   1> clazzName: GermanLightStemFilter
   [junit4]   1> simpleName: GermanLightStem
   [junit4]   1> clazzName: GermanMinimalStemFilter
   [junit4]   1> simpleName: GermanMinimalStem
   [junit4]   1> clazzName: GermanStemFilter
   [junit4]   1> simpleName: GermanStem
   [junit4]   1> clazzName: EnglishPossessiveFilter
   [junit4]   1> simpleName: EnglishPossessive
   [junit4]   1> clazzName: EnglishMinimalStemFilter
   [junit4]   1> simpleName: EnglishMinimalStem
   [junit4]   1> clazzName: PorterStemFilter
   [junit4]   1> simpleName: PorterStem
   [junit4]   1> clazzName: KStemFilter
   [junit4]   1> simpleName: KStem
   [junit4]   1> clazzName: ItalianLightStemFilter
   [junit4]   1> simpleName: ItalianLightStem
   [junit4]   1> clazzName: HindiStemFilter
   [junit4]   1> simpleName: HindiStem
   [junit4]   1> clazzName: HindiNormalizationFilter
   [junit4]   1> simpleName: HindiNormalization
   [junit4]   1> clazzName: RussianLightStemFilter
   [junit4]   1> simpleName: RussianLightStem
   [junit4]   1> clazzName: ClassicFilter
   [junit4]   1> simpleName: Classic
   [junit4]   1> clazzName: StandardFilter
   [junit4]   1> simpleName: Standard
   [junit4]   1> clazzName: CzechStemFilter
   [junit4]   1> simpleName: CzechStem
   [junit4]   1> clazzName: ElisionFilter
   [junit4]   1> simpleName: Elision
   [junit4]   1> clazzName: DelimitedPayloadTokenFilter
   [junit4]   1> simpleName: DelimitedPayload
   [junit4]   1> clazzName: TokenOffsetPayloadTokenFilter
   [junit4]   1> simpleName: TokenOffsetPayload
   [junit4]   1> clazzName: NumericPayloadTokenFilter
   [junit4]   1> simpleName: NumericPayload
   [junit4]   1> clazzName: TypeAsPayloadTokenFilter
   [junit4]   1> simpleName: TypeAsPayload
   [junit4]   1> clazzName: AutoPhrasingTokenFilter
   [junit4]   1> simpleName: AutoPhrasing
   [junit4]   2> NOTE: reproduce with: ant test
-Dtestcase=TestAllAnalyzersHaveFactories -Dtests.method=test
-Dtests.seed=86F1C35C6CE11696 -Dtests.slow=true -Dtests.locale=zh_CN
-Dtests.timezone=US/Samoa -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
   [junit4] ERROR   2.94s | TestAllAnalyzersHaveFactories.test <<<
   [junit4]    > Throwable #1: java.lang.IllegalArgumentException: A
SPI class of type org.apache.lucene.analysis.util.TokenFilterFactory
with name 'AutoPhrasing' does not exist. You need to add the
corresponding JAR file supporting this SPI to your classpath. The
current classpath supports the following names: [apostrophe,
arabicnormalization, arabicstem, bulgarianstem, brazilianstem,
cjkbigram, cjkwidth, soraninormalization, soranistem, commongrams,
commongramsquery, dictionarycompoundword, hyphenationcompoundword,
decimaldigit, lowercase, stop, type, uppercase, czechstem,
germanlightstem, germanminimalstem, germannormalization, germanstem,
greeklowercase, greekstem, englishminimalstem, englishpossessive,
kstem, porterstem, spanishlightstem, persiannormalization,
finnishlightstem, frenchlightstem, frenchminimalstem, irishlowercase,
galicianminimalstem, galicianstem, hindinormalization, hindistem,
hungarianlightstem, hunspellstem, indonesianstem, indicnormalization,
italianlightstem, latvianstem, asciifolding, capitalization,
codepointcount, fingerprint, hyphenatedwords, keepword, keywordmarker,
keywordrepeat, length, limittokencount, limittokenoffset,
limittokenposition, removeduplicates, stemmeroverride, trim, truncate,
worddelimiter, scandinavianfolding, scandinaviannormalization,
edgengram, ngram, norwegianlightstem, norwegianminimalstem,
patternreplace, patterncapturegroup, delimitedpayload, numericpayload,
tokenoffsetpayload, typeaspayload, portugueselightstem,
portugueseminimalstem, portuguesestem, reversestring,
russianlightstem, shingle, snowballporter, serbiannormalization,
classic, standard, swedishlightstem, synonym, turkishlowercase,
elision]
   [junit4]    >        at
__randomizedtesting.SeedInfo.seed([86F1C35C6CE11696:EA5FC86C21D7B6E]:0)
   [junit4]    >        at
org.apache.lucene.analysis.util.AnalysisSPILoader.lookupClass(AnalysisSPILoader.java:135)
   [junit4]    >        at
org.apache.lucene.analysis.util.TokenFilterFactory.lookupClass(TokenFilterFactory.java:42)
   [junit4]    >        at
org.apache.lucene.analysis.core.TestAllAnalyzersHaveFactories.test(TestAllAnalyzersHaveFactories.java:168)
   [junit4]    >        at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=CheapBastard,
sim=ClassicSimilarity, locale=zh_CN, timezone=US/Samoa
   [junit4]   2> NOTE: Linux 2.6.32-358.el6.x86_64 amd64/Oracle
Corporation 1.8.0_05
(64-bit)/cpus=4,threads=1,free=136794808,total=160432128
   [junit4]   2> NOTE: All tests run in this JVM:
[TestAllAnalyzersHaveFactories]
   [junit4] Completed [1/1] in 4.33s, 1 test, 1 error <<< FAILURES!
   [junit4]
   [junit4]
   [junit4] Tests with failures [seed: 86F1C35C6CE11696]:
   [junit4]   -
org.apache.lucene.analysis.core.TestAllAnalyzersHaveFactories.test
   [junit4]
   [junit4]
   [junit4] JVM J0:     0.66 ..     6.09 =     5.44s
   [junit4] Execution time total: 6.11 sec.
   [junit4] Tests summary: 1 suite, 1 test, 1 error
================================================

Running the test under debugger in Eclipse, it gives the same error
message for a different Factory class 'DaitchMokitoffSoundex'. This
may or may not be related to my issue, not sure.

My guess is there is some sort of class loader issue. My understanding
of the test is that it is making sure there is a corresponding
TokenFilter Factory for a TokenFilter. In this case that would be
AutoPhrasingTokenFilterFactory. Now, I checked to make sure the class
is created. The 'find' command shows the class at:

build/analysis/common/classes/java/org/apache/lucene/analysis/autophrase/AutoPhrasingTokenFilterFactory.class

The location is similar to other Filter factories.

I have put in print statement as well as running the test in Eclipse
debugger. As far as I can see, the test code sees the
AutoPhrasingTokenFilter. Looking at
TestAllAnalyzersHaveFactories.java, at line marked with '1>', the test
code picks up the class AutoPhrasingTokenFilter. However, when it gets
to line '2>', it fails:

===========================================
  public void test() throws Exception {
1>    List<Class<?>> analysisClasses =
TestRandomChains.getClassesForPackage("org.apache.lucene.analysis");

    ClassLoader cl = ClassLoader.getSystemClassLoader();

    URL[] urls = ((URLClassLoader)cl).getURLs();
//    System.out.println("ClassPath Start:");
    for(URL url: urls){
//      System.out.println(url.getFile());
    }
//    System.out.println("ClassPath Ends!");

    for (final Class<?> c : analysisClasses) {
      final int modifiers = c.getModifiers();
      if (
        // don't waste time with abstract classes
        Modifier.isAbstract(modifiers) || !Modifier.isPublic(modifiers)
        || c.isSynthetic() || c.isAnonymousClass() ||
c.isMemberClass() || c.isInterface()
        || testComponents.contains(c)
        || crazyComponents.contains(c)
        || oddlyNamedComponents.contains(c)
        || c.isAnnotationPresent(Deprecated.class) // deprecated ones
are typically back compat hacks
        || !(Tokenizer.class.isAssignableFrom(c) ||
TokenFilter.class.isAssignableFrom(c) ||
CharFilter.class.isAssignableFrom(c))
      ) {
        continue;
      }

      Map<String,String> args = new HashMap<>();
      args.put("luceneMatchVersion", Version.LATEST.toString());

      if (Tokenizer.class.isAssignableFrom(c)) {
        String clazzName = c.getSimpleName();
        assertTrue(clazzName.endsWith("Tokenizer"));
        String simpleName = clazzName.substring(0, clazzName.length() - 9);
        assertNotNull(TokenizerFactory.lookupClass(simpleName));
        TokenizerFactory instance = null;
        try {
          instance = TokenizerFactory.forName(simpleName, args);
          assertNotNull(instance);
          if (instance instanceof ResourceLoaderAware) {
            ((ResourceLoaderAware) instance).inform(loader);
          }
          assertSame(c, instance.create().getClass());
        } catch (IllegalArgumentException e) {
          if (e.getCause() instanceof NoSuchMethodException) {
            // there is no corresponding ctor available
            throw e;
          }
          // TODO: For now pass because some factories have not yet a
default config that always works
        }
      } else if (TokenFilter.class.isAssignableFrom(c)) {
        String clazzName = c.getSimpleName();
        System.out.println("clazzName: " + clazzName);
        assertTrue(clazzName.endsWith("Filter"));
        String simpleName = clazzName.substring(0, clazzName.length()
- (clazzName.endsWith("TokenFilter") ? 11 : 6));
        System.out.println("simpleName: " + simpleName);
2>        assertNotNull(TokenFilterFactory.lookupClass(simpleName));
=====================================================

Here is the code for the factory class:

package org.apache.lucene.analysis.autophrase;

/*
 * Copyright 2015 Synopsys, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License"); you
 * may not use this file except in compliance with the License. You may
 * obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.IOException;
import java.util.Map;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.util.CharArraySet;
import org.apache.lucene.analysis.util.ResourceLoader;
import org.apache.lucene.analysis.util.ResourceLoaderAware;
import org.apache.lucene.analysis.util.TokenFilterFactory;

public class AutoPhrasingTokenFilterFactory extends TokenFilterFactory
implements ResourceLoaderAware {

  private CharArraySet phraseSets;
  private final String phraseSetFiles;
  private final boolean ignoreCase;
  private final boolean emitSingleTokens;
  private final boolean quotePhrase;
  private final boolean emitAmbiguousPhrases;

  private String replaceWhitespaceWith = null;

  public AutoPhrasingTokenFilterFactory(Map<String, String> initArgs) {
    super( initArgs );
    phraseSetFiles = get(initArgs, "phrases");
    ignoreCase = getBoolean( initArgs, "ignoreCase", false);
    emitSingleTokens = getBoolean( initArgs, "includeTokens", false );
    quotePhrase = getBoolean( initArgs, "quotePhrase", false );
    emitAmbiguousPhrases = getBoolean( initArgs,
"emitAmbiguousPhrases", false );

  String replaceWhitespaceArg = initArgs.get( "replaceWhitespaceWith" );
  if (replaceWhitespaceArg != null) {
      replaceWhitespaceWith = replaceWhitespaceArg;
    }
  }

  @Override
  public void inform(ResourceLoader loader) throws IOException {
    if (phraseSetFiles != null) {
      phraseSets = getWordSet(loader, phraseSetFiles, ignoreCase);
    }
  }

  @Override
  public TokenStream create( TokenStream input ) {
    AutoPhrasingTokenFilter autoPhraseFilter = new
AutoPhrasingTokenFilter( input, phraseSets, emitSingleTokens );
    if (replaceWhitespaceWith != null) {
      autoPhraseFilter.setReplaceWhitespaceWith( new Character(
replaceWhitespaceWith.charAt( 0 )) );
    }
    //Doesn't make send to emit phrases in double quotes if
replaceWhitespaceWith character is set.
    if ((replaceWhitespaceWith == null) && quotePhrase) {
      autoPhraseFilter.setQuotePhrase(quotePhrase);
    }
    if (emitAmbiguousPhrases) {
        autoPhraseFilter.setEmitAmbiguousPhrases(emitAmbiguousPhrases);
    }
    return autoPhraseFilter;
  }
}

Thanks,

Koorosh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message