lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nemani, Raj" <Raj.Nem...@turner.com>
Subject RE: question on solr.ASCIIFoldingFilterFactory
Date Tue, 05 Apr 2011 17:33:08 GMT
Here is the field type definition for ‘text’ field which is what I am using for the indexed
fields.  Can you guys notice any obvious filter that could be the issue?

---------------------------------------------------------------------------

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <!-- in this example, we will only use synonyms at query time

        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>

        -->

        <!-- Case insensitive stop word removal.

          add enablePositionIncrements=true in both the index and query

          analyzers to leave a 'gap' for more accurate phrase queries.

        -->

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="stopwords.txt"

                enablePositionIncrements="true"

                />

        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

                                <filter class="solr.ASCIIFoldingFilterFactory"/>

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="stopwords.txt"

                enablePositionIncrements="true"

                />

        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

      </analyzer>

    </fieldType>

 

From: Steven A Rowe [mailto:sarowe@syr.edu] 
Sent: Tuesday, April 05, 2011 12:28 PM
To: solr-user@lucene.apache.org
Subject: RE: question on solr.ASCIIFoldingFilterFactory

 

I added this test method locally to TestASCIIFoldingFilter.java in the Lucene/Solr 3.1.0 source

tree, and it passed, so the filter is not the problem (and the Solr factory certainly isn't

either - it's just a wrapper) - I second Ludovic's question - you must have other filters

configured:

 

  public void testPluralNotTrimmed() throws Exception {

    TokenStream stream = new WhitespaceTokenizer(TEST_VERSION_CURRENT, new StringReader

      ("después Imágenes"));

    ASCIIFoldingFilter filter = new ASCIIFoldingFilter(stream);

    CharTermAttribute termAtt = filter.getAttribute(CharTermAttribute.class);

 

    assertTermEquals("despues", filter, termAtt);

    assertTermEquals("Imagenes", filter, termAtt);

  }  

 

Steve

 

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message