lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David_Birthw...@VWR.COM
Subject Re: Lowercasing wildcards - why?
Date Fri, 30 May 2003 16:06:23 GMT

True enough.  We're supporting search of a product database, so, for us, it
made sense to increase coverage and accept the loss of precision.  Our
solution is definitely not globally applicable.

DaveB




                                                                                         
                 
                      Leo Galambos                                                       
                 
                      <Leo.G@seznam.cz>        To:       Lucene Users List         
                       
                                                <lucene-user@jakarta.apache.org>   
                       
                      05/30/03 11:55 AM        cc:                                       
                 
                      Please respond to        Subject:  Re: Lowercasing wildcards - why?
                 
                      "Lucene Users                                                      
                 
                      List"                                                              
                 
                                                                                         
                 
                                                                                         
                 




Ah, I got it. THX. In the good old days, the wildcards were used as a
fix for missing stemming module. I am not sure if you can combine these
two opposite approaches successfully. I see the following drawbacks of
your solution.

Example:
built* (->built) could be changed to build* (no built, but ->builder,
building, etc.), and precision will go down drastically.

You probably use a stemmer with one important bug (a.k.a. feature) -
overstemming, so here is another example:
political* (->political, politically) is transformed to polic*
(->policer, policy, policies, policement etc.) by Porter alg., and the
precision is again affected drastically

-g-

David_Birthwell@VWR.COM wrote:

>Your analyzers can optionally incorporate stemming, along with the other
>things that analyzers do (lowercasing, etc...).  The stemming algorithms
>are all different.  This "searcher" example was made up, but, there are
>instances where stemming at index time and not stemming wildcard searches
>will result in lost hits.  Specifically, we encountered this situation
>using the optional Snoball analyzers (which work great, by the way).
>
>DaveB
>
>
>
>
>

>                      Leo Galambos

>                      <Leo.G@seznam.cz>        To:       Lucene Users List

>
<lucene-user@jakarta.apache.org>
>                      05/30/03 10:26 AM        cc:

>                      Please respond to        Subject:  Re: Lowercasing
wildcards - why?
>                      "Lucene Users

>                      List"

>

>

>
>
>
>
>I'm sorry, I did not read the complete thread. Do you mean - analyzer ==
>stemmer? Does it really work? If I was a stemmer, I would let "searche"
>intact. ;-)
>
>-g-
>
>David_Birthwell@VWR.COM wrote:
>
>
>
>>Hi Les,
>>
>>We ended up modifying the QueryParser to pass prefix and suffix queries
>>through the Analyzer.  For us, it was about stemming.  If you decide to
>>
>>
>use
>
>
>>an analyzer that incorporated stemming, there are cases where wildcard
>>queries will not return the expected results.
>>
>>Example:  "searcher" will probably get stemmed to "search".  A search on
>>"searche*" should hit the term "searcher", but, it won't, all instances
of
>>"searcher" having been stemmed to "search" at index time.  Our solution
>>
>>
>was
>
>
>>to remove the trailing wildcard and send "searche" to the analyzer, then
>>tack the wildcard character back on there and create the PrefixQuery
>>
>>
>object
>
>
>>with the new search string "search*".
>>
>>DaveB
>>
>>
>>
>>
>>
>>
>>
>
>
>
>>                     Leslie Hughes
>>
>>
>
>
>
>>                     <Leslie.Hughes@watercorporat        To:
>>
>>
>"'lucene-user@jakarta.apache.org'"
>
>
>>                     ion.com.au>
>>
>>
><lucene-user@jakarta.apache.org>
>
>
>>                                                         cc:
>>
>>
>
>
>
>>                     05/30/03 01:09 AM                   Subject:
>>
>>
>Lowercasing wildcards - why?
>
>
>>                     Please respond to "Lucene
>>
>>
>
>
>
>>                     Users List"
>>
>>
>
>
>
>
>
>
>
>
>
>>
>>
>>Hi,
>>
>>I was just wondering what the rationale is behind lowercasing wildcard
>>queries produced by QueryParser? It's just that my data is all upper case
>>and my analyser doesn't lowercase so it seems a bit odd that I have to
>>
>>
>call
>
>
>>setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
>>unnormalised or better still pass them through the analyser?
>>
>>I'm sure there's a good reason for it though.....
>>
>>
>>Les
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>>
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org







---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message