lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: wildcards, stemming and searching
Date Thu, 10 Feb 2005 15:55:25 GMT
How would you deal with a query like "a*z" though?

I suspect, however, that you only care about suffix queries and 
stemming those.  If thats the case, then you could subclass 
getWildcardQuery and do internal stemming (remove trailing wildcard, 
run it through the analyzer directly there and return a modified 
WildcardQuery instance.

With wildcard queries though, this is risky.  Prefixes won't 
necessarily stem to what the full word would stem to.

	Erik


On Feb 9, 2005, at 6:26 PM, aaz wrote:

> Hi,
> We are not using QueryParser and have some custom Query construction.
>
> We have an index that indexes various documents. Each document is 
> Analyzed and indexed via
>
> StandardTokenizer() ->StandardFilter() -> LowercaseFilter() -> 
> StopFilter() -> PorterStemFilter()
>
> We also want to support wildcard queries, hence on an inbound query we 
> need to deal with "*" in the value side of the comparison. We also 
> need to "analyze" the value side of the query against the same 
> analyzer in which the index was built with. This leads to some 
> problems and would like your solution opinion.
>
> User queries.
>
> somefield = united*
>
> After the analyzer hits "united*", we get back "unit". Hence we cannot 
> detect that the user requested a wildcard.
>
> Lets say we come up with some solution to "escape" the "*" char before 
> the Analyzer hits it. For example
>
> somefield = united*  -> unitedXXWILDCARDXX
>
> After analysis this then becomes "unitedxxwildcardxx", which we can 
> then turn into a WildcardQuery "united*"
>
> The problem here is that the term "united" will never exist in the 
> indexing due to the stemming which did not stem properly due to our 
> escape mechanism.
>
> How can I solve this problem?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message