lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roopesh P Raj <roop...@digitalglue.in>
Subject Re: Doub't in the way lucene works
Date Thu, 14 Feb 2008 10:28:42 GMT
Hi Stu,

Thank you very much for your reply. It cleared very many things.

Thanks,
Roopesh

Stu Hood wrote:
> Hello Roopesh,
>
> What you are seeing is called 'Stemming'. Stemming takes tokens and reduces them to their
language specific prefixes. So for instance, when you search for attach, you get the word
'attachment', which shares a common English language specific prefix.
>
> Newsletter is an interesting example: you will never get a match when you search for
'letter', because stemming only handles prefixes. The fact that you don't get a match for
news is a bit more complicated. The stemming engine did not reduce newsletter all the way
to the 'news' prefix, perhaps because the words have semantically different meanings (where
in the attach/attachment case, an attachment is something that you attach).
>
> I can't find any good Solr specific stemming links, but check out the Wikipedia page:
http://en.wikipedia.org/wiki/Stemming
>
> Thanks,
> Stu
>
>
> -----Original Message-----
> From: Roopesh P Raj <roopesh@digitalglue.in>
> Sent: Wednesday, February 13, 2008 1:43am
> To: solr-dev@lucene.apache.org
> Subject: Doub't in the way lucene works
>
> Hi,
>
> I am using solr in my project. I have used the schema almost similar to 
> the one given in the example folder which comes along when we download 
> solr. Most of the fields that I use is of type "text", and the rest are 
> of type "string".
>
> Some of the search results are as follows:
>
> When I search with a query, "attach", documents containing "attach", 
> "attachment", "attachments" comes as the result.
> When the search string is "attachment", then also documents containing 
> "attach", "attachment", "attachments" comes as the result.
>
> When I search for "newsletter", documents with keyword "newsletter" results.
> But when I search for "news", no results appear.
> When I search for "letter", then also there are no results.
>
> Why does this happen?
> Why is lucene not giving documents with "newsletter" when the search 
> string given is "letter" or "news"?
>
> I am pasting the "text" fieldtype declaration also. Please help me.
>
>     <fieldType name="text" class="solr.TextField" 
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory" 
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.EnglishPorterFilterFactory" 
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="0" 
> catenateNumbers="0" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.EnglishPorterFilterFactory" 
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> Regards
> Roopesh
>
>
> ------------------
> DigitalGlue, India
>
>
>
>
>
>
>
>   


------------------
DigitalGlue, India




Mime
View raw message