lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Strange behaviour of tokenizer with wildcard queries
Date Fri, 20 Sep 2013 12:41:06 GMT
It's reasonable that "block-major" won't find anything.
"block-major-57" should match.

The split into block and major-57 will be because, from the javadocs
for ClassicTokenizer, "Splits words at hyphens, unless there's a
number in the token, in which case the whole token is interpreted as a
product number and is not split.".  So I guess it splits on the first
hyphen but not the second.

ClassicAnalyzer/Tokenizer is general purpose and will never meet
everyone's requirement all the time.  You could try a different
analyzer, or build your own.  That's what the javadoc recommends.


--
Ian.


On Fri, Sep 20, 2013 at 1:26 PM, Ramprakash Ramamoorthy
<youngestachiever@gmail.com> wrote:
> Sorry, hit the send button accidentally the last time. Please read below :
>
> Hello,
>
>             We're using lucene 4.1. We have the word "*block-major-57*"
> indexed. Using the classic analyzer, we get the following tokens : *block*and
> *major-57*.
>
>              I search for *block-major*, *the document doesn't match.
> However searching for *block** works perfect. Is this a bug, or am I doing
> something wrong?
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Chennai, India.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message