lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "aaz">
Subject wildcards, stemming and searching
Date Wed, 09 Feb 2005 23:26:52 GMT
We are not using QueryParser and have some custom Query construction.

We have an index that indexes various documents. Each document is Analyzed and indexed via

StandardTokenizer() ->StandardFilter() -> LowercaseFilter() -> StopFilter() ->

We also want to support wildcard queries, hence on an inbound query we need to deal with "*"
in the value side of the comparison. We also need to "analyze" the value side of the query
against the same analyzer in which the index was built with. This leads to some problems and
would like your solution opinion.

User queries.

somefield = united*

After the analyzer hits "united*", we get back "unit". Hence we cannot detect that the user
requested a wildcard.

Lets say we come up with some solution to "escape" the "*" char before the Analyzer hits it.
For example

somefield = united*  -> unitedXXWILDCARDXX

After analysis this then becomes "unitedxxwildcardxx", which we can then turn into a WildcardQuery

The problem here is that the term "united" will never exist in the indexing due to the stemming
which did not stem properly due to our escape mechanism.

How can I solve this problem?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message