lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)
Date Mon, 22 Aug 2011 13:52:22 GMT
No, that's expected. StandardAnalyzer breaks on '_' as far as I know.

NOTE: the behavior changed a bit as of Solr 3.1. To get the old
StandardAnalyzer behavior, I believe you need ClassicAnalyzer...

More than you ever want to know about breaking lines (3.1+)
http://unicode.org/reports/tr29/#Word_Boundaries
Linked to from:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory


Best
ERick

On Mon, Aug 22, 2011 at 1:47 AM,  <srinu.hello@gmail.com> wrote:
> Hello All,
>           I observed  some unexpected behavior using StandardAnalyzer to parse
the query. Here is the demonstration.
>
> I am passing the query as (key:xyz_abc) && (text:blabla)
>
> Expecting the parsed query to be +key:xyz_abc +text:blabla
>
> Actual Result is +key:"xyz abc" +text:blabla
>
> As per my understanding StandardAnalyzer splits the word boundaries into multiple words
but the above word xyz_abc is a single word. Please correct me if i am wrong.
>
> I also observed if number is there after underscore the parsed query is as expected.
i.e
>
> If i give the query as (key:xyz_1abc) && (text:blabla) the parsed query is +key:xyz_1abc
+text:blabla
>
> This is the behavior i am expecting.
>
> Please help.
>
> Thanks,
> Srinivas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message