lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Rogers <>
Subject Re: Wildcard with Words Ending in 'e'
Date Thu, 04 May 2006 08:01:29 GMT

Your shot in the dark was correct, I am using Porter Stemming. Thanks to your excellent response
understand now why wildcards and stemming are now very useful when combined. I'm going to

investigate other means of taking into account the wildcard queries and handling them in a
way. Rest assured that if I find a solution I'll post it to the user group.


Chris Hostetter wrote:
> I'm going to take a shot in the dark and guess that the field in question
> is analyzed using some form of Stemming which strips off trailing "e"
> characters (Porter Stemmer seems to do this).
> In the past, there was a bug in the way WildcardQuery works, so that the
> metacharacter "?" matches 0 or 1 characters even though it's documented to
> match exactly one character -- so "peopl?" would match both "peopl" and
> peoplX" -- which wasn't correct.
> If i'm right about why you are encountering this problem, then your test
> case isn't really valid.  Either you should do you wildcard queries on a
> field that isn't stemmed, or you should take the stemming into account
> (somehow ... if you think of a good way i'm sure others would be
> interested) when building your wildcard query.
> Keep in mind that trying to use wildcard queries on a field which uses
> porter stemming isn't generally a good idea regardless of wether you like
> the old behavior of "?" or not ... if your goal is to find documents
> containing words like "elephant" and "elephantine" so you search for
> "elephan*" you are going to miss any documents containing "elephant"
> because Porter Stemmer trims that to "eleph"
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message