lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From govind bhardwaj <govins...@gmail.com>
Subject Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)
Date Mon, 22 Aug 2011 14:53:56 GMT
Hi Eric,

Thanks for your reply.

I verified Srinivas' query by changing Lucene version ( in the constructor
of StandardAnalyzer ) to LUCENE_30 to find that parsed query
indeed changes to xyz abc (input query was 'xyz_abc') while that does not
happen in case of LUCENE_33 and the parsed query remains 'xyz_abc'.
I can't figure out why that may be happening.

Regards,
Govind



On Mon, Aug 22, 2011 at 7:22 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> No, that's expected. StandardAnalyzer breaks on '_' as far as I know.
>
> NOTE: the behavior changed a bit as of Solr 3.1. To get the old
> StandardAnalyzer behavior, I believe you need ClassicAnalyzer...
>
> More than you ever want to know about breaking lines (3.1+)
> http://unicode.org/reports/tr29/#Word_Boundaries
> Linked to from:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
>
>
> Best
> ERick
>
> On Mon, Aug 22, 2011 at 1:47 AM,  <srinu.hello@gmail.com> wrote:
> > Hello All,
> >           I observed  some unexpected behavior using StandardAnalyzer to
> parse the query. Here is the demonstration.
> >
> > I am passing the query as (key:xyz_abc) && (text:blabla)
> >
> > Expecting the parsed query to be +key:xyz_abc +text:blabla
> >
> > Actual Result is +key:"xyz abc" +text:blabla
> >
> > As per my understanding StandardAnalyzer splits the word boundaries into
> multiple words but the above word xyz_abc is a single word. Please correct
> me if i am wrong.
> >
> > I also observed if number is there after underscore the parsed query is
> as expected. i.e
> >
> > If i give the query as (key:xyz_1abc) && (text:blabla) the parsed query
> is +key:xyz_1abc +text:blabla
> >
> > This is the behavior i am expecting.
> >
> > Please help.
> >
> > Thanks,
> > Srinivas
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
No trees were harmed in the creation of this message, but several thousand
electrons were mildly inconvenienced.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message