lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Marqués Rodríguez <amarq...@paradigmatecnologico.com>
Subject Re: Stemming - disable at query time - reg.
Date Mon, 19 Apr 2010 10:26:16 GMT
Hi Naga,

I think you should add the same filter to the query configuration:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.
WhitespaceTokenizerFactory"/>
       <filter class="solr.StopFilterFactory"
               ignoreCase="true"
               words="stopwords.txt"
               enablePositionIncrements="true"
               />
       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
     </analyzer>
     <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
       <filter class="solr.StopFilterFactory"
               ignoreCase="true"
               words="stopwords.txt"
               enablePositionIncrements="true"
               />
       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
*<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>*
     </analyzer>
   </fieldType>

That way stemming is applied to the query, so it would search for "work"
instead of "working" and, therefore you should be able to retrieve both
"worked" and "working".

You can see the diferent transformations due to analyzers in query and index
time in the "analysis" link inside the Solr admin page so you can check why
a given query doesn't match some text.

In this case I think you should get:

Index: Working -> Work (Applies stemming)
Query: Working -> Working (Doesn't apply stemming)

So "working" won't match "work"

Regards


2010/4/19 Naga Darbha <ndarbha@opentext.com>

> Hi Mitch,
>
> I have defined my field like:
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>      <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> I have indexed two documents with "working" and "worked" values and when I
> search for "working" it is not giving me any results, whereas when I search
> for "work" it is giving me two results.
>
> What should I be doing to get the query results for "working".
>
> regards,
> Naga
>
> -----Original Message-----
> From: Naga Darbha [mailto:ndarbha@opentext.com]
> Sent: Monday, April 19, 2010 2:45 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Stemming - disable at query time - reg.
>
> Thank you Mitch! I will try that.
>
> regards,
> Naga
>
>
>
> -----Original Message-----
> From: MitchK [mailto:mitch91@web.de]
> Sent: Monday, April 19, 2010 2:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Stemming - disable at query time - reg.
>
>
> Naga,
>
> 1) Yes, it is possible.
> <fieldType name="myText" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>          ....
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>          ....
>       </analyzer>
>      <analyzer type="query">
>         ... define those filters which you want to apply at query-time
>      </analyzer>
> </fieldType>
>
> 2) I am not sure whether I understand your question right:
> You do not need to copyField your myText-field, if it is okay for you that
> the indexed data of the myText-field is stemmed and the query is not.
> For example: if the original data consists of the sentence "I am working"
> than it (maybe) looks like this after it is stemmed "I am work". If you
> query against this with the term "working" there will be no match, if you
> don't stem your querystring, too.
>
> Hope this helps.
>
> - Mitch
> --
> View this message in context:
> http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message