lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Wildcard with Words Ending in 'e'
Date Wed, 03 May 2006 18:00:58 GMT

I'm going to take a shot in the dark and guess that the field in question
is analyzed using some form of Stemming which strips off trailing "e"
characters (Porter Stemmer seems to do this).

In the past, there was a bug in the way WildcardQuery works, so that the
metacharacter "?" matches 0 or 1 characters even though it's documented to
match exactly one character -- so "peopl?" would match both "peopl" and
peoplX" -- which wasn't correct.

   http://issues.apache.org/jira/browse/LUCENE-306

If i'm right about why you are encountering this problem, then your test
case isn't really valid.  Either you should do you wildcard queries on a
field that isn't stemmed, or you should take the stemming into account
(somehow ... if you think of a good way i'm sure others would be
interested) when building your wildcard query.

Keep in mind that trying to use wildcard queries on a field which uses
porter stemming isn't generally a good idea regardless of wether you like
the old behavior of "?" or not ... if your goal is to find documents
containing words like "elephant" and "elephantine" so you search for
"elephan*" you are going to miss any documents containing "elephant"
because Porter Stemmer trims that to "eleph"


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message