lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)" <external.Ravi.Tamin...@us.bosch.com>
Subject RE: Lower/UpperCase Issue
Date Wed, 09 Jul 2014 20:03:51 GMT
Do I need to use different algorithm instead of porter stemming..? can you suggest anything
in you mind..?

-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID] 
Sent: Wednesday, July 09, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue

Hi,

Analysis admin page will tell you the truth. Just a guess: porter stem filter could be "case
sensitive" and that may cause the difference. I am pretty sure porter stemming algorithms
designed to work on lowercase input.

By the way you have two lowercase filters defined in index analyzer.

Ahmet



On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)"
<external.Ravi.Taminidi@us.bosch.com> wrote:
I have a situation here, when I search with "BALANCER" the results are different Compare to
"Balancer" and the order is different  When I search "BALANCER" then, the documents with
Upper Case are first in the List and for "Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missing something.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory" />
      <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
/>
              <filter class="solr.PorterStemFilterFactory"/>
              <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
         <charFilter class="solr.HTMLStripCharFilterFactory" />
     <tokenizer class="solr.StandardTokenizerFactory"/>
              <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>

         </analyzer>
    </fieldType>

e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi

Mime
View raw message