lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: search item with '-' in it
Date Wed, 04 Jun 2003 16:58:40 GMT
You should look at the output of your analyzer.  Just write a simple 
test program, something like:

   public static void main(String[] args) throws Exception {
     System.out.println("Tokenizing " + args[0]);
     Analyzer analyzer = new MyAnalyzer(...);
     TokenStream ts = analyzer.tokenStream(new StringReader(args[0]));
     Token token;
     while ((token = ts.next()) != null) {
       System.out.println("Token: " + token.termText());
     }
   }

StandardAnalyzer will accept hyphenations when digits are included on 
one side or the other.  This is a heuristic used to index things like 
part numbers (which contain digits) as a single word but not index 
things like "long-hyphenated-phrase" as a single word.  It may not be 
appropriate for your application.

Also, a part number field might better be indexed as a keyword field...

Doug

Lixin Meng wrote:
> I have a field, 'PartNumber', that has '-' in its value (e.g.
> SG-XRRH-C1M0-A).
> 
> After indexing, I can perform certain queries. However, I feel confused to
> explain the behavior.
> 
> - if searching for
> 	PartNumber:"SG"
>   it will return multiple hits. I assume the anaylzer might take out '-'.
> 
> - if searching for
> 	PartNumber:"XRRH"
>   it will return no hit. So, the above assumption doesn't hold itself. :)
> 
> - if searching for
> 	PartNumber:"SG-XRRH-C1M0-A"
>   it will return one hit
> 
> - if searching for
>       PartNumber:"sg-xrrh-c1m0-a*"
>   it will return one hit. So far so good
> 
> - if searching for
>       PartNumber:sg-xrrh-c1m0-a*
>   it will return multiple hits which even include things like
> "SG-XSWBRO...". Why?
> 
> - if searching for
>       PartNumber:"sg-xrrh-c1m0*"
>   no hit. Why?
> 
> Any comments?
> 
> Regards,
> Lixin
> 
> P.S. I used following filters
> 
>     result = new StandardFilter(result);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, m_StopWordTable);
>     result = new PorterStemFilter(result);
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message