lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lixin Meng" <li...@fulldegree.com>
Subject RE: search item with '-' in it
Date Sun, 01 Jun 2003 16:42:43 GMT
Thanks for the reply.

If it treats '-' as a delimiter, I assume that means the 'SG-XRRH-C1M0-A'
will be parsed as 'SG XRRH C1M0 A'. However, why there is no result when I
search for 'XRRH', although 'SG' and 'A' do generate some hits.

Regards,
Lixin

-----Original Message-----
From: Che Dong [mailto:chedong@hotmail.com]
Sent: Sunday, June 01, 2003 8:08 AM
To: Lucene Developers List
Subject: Re: search item with '-' in it


the default analyser only token source with isLetter() with SimpleTokenizer;
the other charactors like: "_" "#" "-" were igored.

for some app maybe need isLetterOrDigit(). I think maybe it can be added
constructor into SimpleTokenizer(char[] validChars), like inital stop words
for StopFilter, we can specify witch kind of charactors can be tokened as
"letters".

Regards

Che, Dong
http://www.chedong.com/

----- Original Message -----
From: "Lixin Meng" <lixin@fulldegree.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Sunday, June 01, 2003 12:26 PM
Subject: search item with '-' in it


> Sorry for re-posting this message. Since I didn't get any response from
> user-list, I hope someone on developer-list can answer it.
>
> I have a field, 'PartNumber', that has '-' in its value (e.g.
> SG-XRRH-C1M0-A).
>
> After indexing, I can perform certain queries. However, I feel confused to
> explain the behavior.
>
> - if searching for
> PartNumber:"SG"
> or
> PartNumber:"A"
>   it will return multiple hits. I assume the anaylzer might take out '-'.
>
> - if searching for
> PartNumber:"XRRH"
>   it will return no hit. So, the above assumption doesn't hold itself. :)
>
> - if searching for
> PartNumber:"SG-XRRH-C1M0-A"
>   it will return one hit
>
> - if searching for
>       PartNumber:"sg-xrrh-c1m0-a*"
>   it will return one hit. So far so good
>
> - if searching for
>       PartNumber:sg-xrrh-c1m0-a*
>   it will return multiple hits which even include things like
> "SG-XSWBRO...". Why?
>
> - if searching for
>       PartNumber:"sg-xrrh-c1m0*"
>   no hit. Why?
>
> Any comments?
>
> Regards,
> Lixin
>
> P.S. I used following filters
>
>     result = new StandardFilter(result);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, m_StopWordTable);
>     result = new PorterStemFilter(result);
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message