lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Che Dong" <ched...@hotmail.com>
Subject Re: search item with '-' in it
Date Sun, 01 Jun 2003 15:07:42 GMT
the default analyser only token source with isLetter() with SimpleTokenizer;
the other charactors like: "_" "#" "-" were igored.

for some app maybe need isLetterOrDigit(). I think maybe it can be added  constructor into
SimpleTokenizer(char[] validChars), like inital stop words for StopFilter, we can specify
witch kind of charactors can be tokened as "letters".

Regards

Che, Dong
http://www.chedong.com/

----- Original Message ----- 
From: "Lixin Meng" <lixin@fulldegree.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Sunday, June 01, 2003 12:26 PM
Subject: search item with '-' in it


> Sorry for re-posting this message. Since I didn't get any response from
> user-list, I hope someone on developer-list can answer it.
> 
> I have a field, 'PartNumber', that has '-' in its value (e.g.
> SG-XRRH-C1M0-A).
> 
> After indexing, I can perform certain queries. However, I feel confused to
> explain the behavior.
> 
> - if searching for
> PartNumber:"SG"
> or
> PartNumber:"A"
>   it will return multiple hits. I assume the anaylzer might take out '-'.
> 
> - if searching for
> PartNumber:"XRRH"
>   it will return no hit. So, the above assumption doesn't hold itself. :)
> 
> - if searching for
> PartNumber:"SG-XRRH-C1M0-A"
>   it will return one hit
> 
> - if searching for
>       PartNumber:"sg-xrrh-c1m0-a*"
>   it will return one hit. So far so good
> 
> - if searching for
>       PartNumber:sg-xrrh-c1m0-a*
>   it will return multiple hits which even include things like
> "SG-XSWBRO...". Why?
> 
> - if searching for
>       PartNumber:"sg-xrrh-c1m0*"
>   no hit. Why?
> 
> Any comments?
> 
> Regards,
> Lixin
> 
> P.S. I used following filters
> 
>     result = new StandardFilter(result);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, m_StopWordTable);
>     result = new PorterStemFilter(result);
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
Mime
View raw message