lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie <ja...@stimulussoft.com>
Subject Re: Search query problem
Date Fri, 08 Jan 2010 21:27:58 GMT
Hi Ian / Will

Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. 
it could check the capitalization of the first letter of a word and 
whether or not the word is the start of sentence. If so, it could choose 
not apply any stemming. Or am I completely out of whack?

Jamie


Ian Lea wrote:
> Looks like PorterStemFilter converts "Lowe's" to low.  Not very surprising.
>
> Options include
>
> .  Drop the stemming
>
> .  Index stemmed and non-stemmed variants and search both, maybe
> boosting the non-stemmed variant.
>
> If you really want exact matches only, you may also/instead want
> untokenized fields.  Apostrophes etc can be a problem.  Look into what
> analyzers do and use Luke to see what is indexed.
>
> --
> Ian.
>
>
> On Fri, Jan 8, 2010 at 8:01 PM, Jamie <jamie@stimulussoft.com> wrote:
>   
>> Hi There
>>
>> We are trying to search for the exact word "Lowe's" across a large set of
>> indexed data. Our results include everything with "low" in it. Thus, we are
>> receiving a much larger data set that we expected. The data is indexing
>> using the analyzer:
>>           TokenStream result = new StandardTokenizer(reader);
>>           result = new StandardFilter(result);
>>           result = new LowerCaseFilter(result);
>>           result = new StopFilter(result, stopTable);
>>           result = new PorterStemFilter(result);
>>           return result;
>>
>> Thanks
>>
>> Jamie
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
Stimulus Software - MailArchiva
Email Archiving And Compliance
USA Tel: +1-713-343-8824 ext 100
UK Tel: +44-20-80991035 ext 100
Email:  jamie@stimulussoft.com
Web: http://www.mailarchiva.com
To receive MailArchiva Enterprise Edition product announcements, send a message to: <mailarchiva-enterprise-edition-subscribe@stimulussoft.com>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message