lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramprakash Ramamoorthy <youngestachie...@gmail.com>
Subject Re: Strange behaviour of tokenizer with wildcard queries
Date Fri, 20 Sep 2013 12:48:36 GMT
On Fri, Sep 20, 2013 at 6:11 PM, Ian Lea <ian.lea@gmail.com> wrote:

> It's reasonable that "block-major" won't find anything.
> "block-major-57" should match.
>

Thank you Ian,  I understand. But my question is why wouldn't "
block-major**   *" match?, please note the wildcard at the end! Thanks.

>
> The split into block and major-57 will be because, from the javadocs
> for ClassicTokenizer, "Splits words at hyphens, unless there's a
> number in the token, in which case the whole token is interpreted as a
> product number and is not split.".  So I guess it splits on the first
> hyphen but not the second.
>
> ClassicAnalyzer/Tokenizer is general purpose and will never meet
> everyone's requirement all the time.  You could try a different
> analyzer, or build your own.  That's what the javadoc recommends.
>
>
> --
> Ian.
>
>
> On Fri, Sep 20, 2013 at 1:26 PM, Ramprakash Ramamoorthy
> <youngestachiever@gmail.com> wrote:
> > Sorry, hit the send button accidentally the last time. Please read below
> :
> >
> > Hello,
> >
> >             We're using lucene 4.1. We have the word "*block-major-57*"
> > indexed. Using the classic analyzer, we get the following tokens :
> *block*and
> > *major-57*.
> >
> >              I search for *block-major*, *the document doesn't match.
> > However searching for *block** works perfect. Is this a bug, or am I
> doing
> > something wrong?
> >
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > Chennai, India.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Chennai, India

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message