lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "aaz"...@interoperate.com>
Subject Re: wildcards, stemming and searching
Date Thu, 10 Feb 2005 15:59:05 GMT
> How would you deal with a query like "a*z" though?
>

Yeah I know, a user submitting that is certainly possible. I have no idea. I 
am starting to think that NOT stemming on indexing might be the safest 
solution.


----- Original Message ----- 
From: "Erik Hatcher" <erik@ehatchersolutions.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, February 10, 2005 8:55 AM
Subject: Re: wildcards, stemming and searching


> How would you deal with a query like "a*z" though?
>
> I suspect, however, that you only care about suffix queries and stemming 
> those.  If thats the case, then you could subclass getWildcardQuery and do 
> internal stemming (remove trailing wildcard, run it through the analyzer 
> directly there and return a modified WildcardQuery instance.
>
> With wildcard queries though, this is risky.  Prefixes won't necessarily 
> stem to what the full word would stem to.
>
> Erik
>
>
> On Feb 9, 2005, at 6:26 PM, aaz wrote:
>
>> Hi,
>> We are not using QueryParser and have some custom Query construction.
>>
>> We have an index that indexes various documents. Each document is 
>> Analyzed and indexed via
>>
>> StandardTokenizer() ->StandardFilter() -> LowercaseFilter() -> 
>> StopFilter() -> PorterStemFilter()
>>
>> We also want to support wildcard queries, hence on an inbound query we 
>> need to deal with "*" in the value side of the comparison. We also need 
>> to "analyze" the value side of the query against the same analyzer in 
>> which the index was built with. This leads to some problems and would 
>> like your solution opinion.
>>
>> User queries.
>>
>> somefield = united*
>>
>> After the analyzer hits "united*", we get back "unit". Hence we cannot 
>> detect that the user requested a wildcard.
>>
>> Lets say we come up with some solution to "escape" the "*" char before 
>> the Analyzer hits it. For example
>>
>> somefield = united*  -> unitedXXWILDCARDXX
>>
>> After analysis this then becomes "unitedxxwildcardxx", which we can then 
>> turn into a WildcardQuery "united*"
>>
>> The problem here is that the term "united" will never exist in the 
>> indexing due to the stemming which did not stem properly due to our 
>> escape mechanism.
>>
>> How can I solve this problem?
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message