lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Problem with query and wildcard
Date Fri, 20 May 2005 09:28:05 GMT

On May 20, 2005, at 3:55 AM, Möckl Susanne wrote:
> In my index are (for example) 4 documents which contains the word  
> simone.
>
> The problem is that lucene does not find all documents by some inputs.
>
> To explain here some examples and what they return:
>
>
>             input: simone    returns: 4 hits
>             input: simon      returns: 4 hits
>             input: simo        returns: 4 hits
>             input: sim         returns: 0 hits
>             input: sim*        returns: 4 hits
>             input: simo*      returns: 4 hits
>             input: simon*     returns: 0 hits
>             input: simon?    returns: 0 hits
>
> The returns are not correct. Why does lucene find all results when  
> writing simo without wildcard but no result when writing simon with  
> wildcard.
>
> What is lucene doing with the input? (I´m using GermanAnalyzer)
>
> Have I made something wrong or is this a problem from lucene?

Thanks for specifying which Analyzer you're using.  That is the key  
to this issue.  GermanAnalyzer analyzes "simone", "simon", and "simo"  
to "simo", and "sim" is unchanged.  So the first four queries you  
provided are explained in this manner (you're using GermanAnalyzer  
with QueryParser also, I presume).

QueryParser does not analyze wildcard queries and remember that  
"simo" is what was indexed, so "sim*" and "simo*" work.  "simon*" and  
"simon?" do not because nothing was indexed with the prefix "simon".

Make sense?

So what's the solution?  You'll need to tinker with the analyzer  
decision until you find one that works for you.  The AnalyzerDemo  
program that comes with the source code to Lucene in Action will  
help.  Download the source code from www.lucenebook.com, unzip it,  
and run "ant AnalyzerDemo".  To help in my reply, I adjusted the  
AnalyzerDemo code locally adding the last items in each of the arrays  
like this:

   private static final String[] examples = {
     "The quick brown fox jumped over the lazy dogs",
     "XY&Z Corporation - xyz@example.com",
     "simone simon simo sim"
   };

   private static final Analyzer[] analyzers = new Analyzer[]{
     new WhitespaceAnalyzer(),
     new SimpleAnalyzer(),
     new StopAnalyzer(),
     new StandardAnalyzer(),
     new GermanAnalyzer()
   };

and got this relevant output:

Analyzing "simone simon simo sim"
   WhitespaceAnalyzer:
     [simone] [simon] [simo] [sim]

   SimpleAnalyzer:
     [simone] [simon] [simo] [sim]

   StopAnalyzer:
     [simone] [simon] [simo] [sim]

   StandardAnalyzer:
     [simone] [simon] [simo] [sim]

   GermanAnalyzer:
     [simo] [simo] [simo] [sim]



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message