lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sendros, Jason" <Jason.Send...@VerizonWireless.com>
Subject RE: StandardAnalyzer splits word. EX: key:abc_xyz converts into key:abc xyz
Date Mon, 22 Aug 2011 13:30:33 GMT
Hi Srinu,

The StandardAnalyzer considers the underscore to be a word separator
which is why you are seeing this behavior. Your other scenario where you
have a number following the underscore is a situation where the
StandardAnalyzer decides that even though there is an underscore, the
entire string should be kept together as one token due to the number,
which changes what that string could be (e.g. "company name", "word",
"abbreviation", etc).

Check this discussion for an understanding of the underscore being a
word separator:
http://www.gossamer-threads.com/lists/lucene/java-dev/72438

And here you can find the StandardAnalyzer class for the most recent
version of Lucene:
http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/analysis/s
tandard/StandardAnalyzer.html

Hopefully reading through those links helps you understand what's
happening within Lucene. To solve this, try using a different analyzer
that suits your needs or perhaps modifying the StandardAnalyzer to
follow the rules you prefer.

Jason

-----Original Message-----
From: srinu.hello [mailto:srinu.hello@gmail.com] 
Sent: Saturday, August 20, 2011 10:11 AM
To: general@lucene.apache.org
Subject: StandardAnalyzer splits word. EX: key:abc_xyz converts into
key:abc xyz

Hello All, 
           I observed  some unexpected behavior using StandardAnalyzer
to
parse the query. Here is the demonstration.

I am passing the query as (key:xyz_abc) && (text:blabla)

Expecting the parsed query to be +key:xyz_abc +text:blabla

Actual Result is +key:"xyz abc" +text:blabla

As per my understanding StandardAnalyzer splits the word boundaries into
multiple words but the above word xyz_abc is a single word. Please
correct
me if i am wrong.

I also observed if number is there after underscore the parsed query is
as
expected. i.e

If i give the query as (key:xyz_1abc) && (text:blabla) the parsed query
is
+key:xyz_1abc +text:blabla

This is the behavior i am expecting. 

Please help.

Thanks,
Srinivas




--
View this message in context:
http://lucene.472066.n3.nabble.com/StandardAnalyzer-splits-word-EX-key-a
bc-xyz-converts-into-key-abc-xyz-tp3270609p3270609.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Mime
View raw message