lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Isakson" <>
Subject RE: Phrase query and porter stemmer
Date Wed, 12 Feb 2003 18:11:46 GMT
You won't get hits for "security" if you do not use the stemmer. The stem of "security" is
the token that gets stored in the index.

If you don't use the stemming algorithm when you create the index you could search for "security"
and only get those documents that contain "security".

See the FAQ

If you have a list of terms you want to treat differently (i.e. you know there are certain
words you don't want to stem) you could build a custom TokenFilter that checks the tokens
for those words before applying the stemming algorithm then add that TokenFilter to your analyzer.
You might also consider allowing the tokens to be stemmed and adding the original non-stemmed
term at the same position using Token.setPositionIncrement(0), you might also want to figure
out some way to boost the score on those non-stemmed tokens when you build your query (not
sure how you might accomplish that, but some custom query parsing code could do the trick).


-----Original Message-----
From: Mailing Lists Account []
Sent: Wednesday, February 12, 2003 4:17 AM
Subject: Phrase query and porter stemmer


I use PorterStemmer with my analyzer for indexing the documents.
And I have been using the same analyzer for searching too.

When I search for a phrase like "security" AND database, I would like to
avoid matches for
terms like "secure" or "securities" .  I observed that Google and couple of
search engines do
not return such matches.

1) In otherwords, in a single query, is it possible not to choose porter
stemmer for phrase queries and
    use for other queries (such as Term query etc)

2) As an alternative, is it advisable to manually construct a PhraseQuery by
adding terms without appling porter
   stemmer ?


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message