lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <sk...@sloan.mit.edu>
Subject Re: Search query problem
Date Sat, 09 Jan 2010 12:19:07 GMT
Couldn't you just mod the PorterStemmer class for your requirements?
(we did and provided it a list of ignore words & phrases specific to
our needs)

On Sat, Jan 9, 2010 at 4:00 AM, Jamie <jamie@stimulussoft.com> wrote:
> Hi All
>
> Is there another stemmer we can use that is perhaps not as aggressive as the
> Porter Stemmer. i.e. the stemming could remove ing's, er's, but not
> something so significant as to convert ""Lowe's" to "Low"
>
> Thanks
>
> Jamie
>
> Will Murnane wrote:
>>
>> On Fri, Jan 8, 2010 at 16:27, Jamie <jamie@stimulussoft.com> wrote:
>>
>>>
>>> Hi Ian / Will
>>>
>>> Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. it
>>> could check the capitalization of the first letter of a word and whether
>>> or
>>> not the word is the start of sentence. If so, it could choose not apply
>>> any
>>> stemming. Or am I completely out of whack?
>>>
>>
>> Look again: you're downcasing the terms before the Porter filter ever
>> sees them (which is, AIUI, necessary).  You might do well to combine
>> the tokenizing and downcasing step with some heuristic to find proper
>> nouns and not downcase or stem them.
>>
>> Will
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Stimulus Software - MailArchiva
> Email Archiving And Compliance
> USA Tel: +1-713-343-8824 ext 100
> UK Tel: +44-20-80991035 ext 100
> Email:  jamie@stimulussoft.com
> Web: http://www.mailarchiva.com
> To receive MailArchiva Enterprise Edition product announcements, send a
> message to: <mailarchiva-enterprise-edition-subscribe@stimulussoft.com>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message