lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: Lucene tokenization
Date Tue, 27 Mar 2012 18:11:58 GMT
Hi Nilesh,

Which version of Lucene are you using?  StandardTokenizer behavior changed in v3.1.


-----Original Message-----
From: Nilesh Vijaywargiay [] 
Sent: Tuesday, March 27, 2012 2:04 PM
Subject: Lucene tokenization

I have a string 01a_b-_-c-d which is tokenized as 01a_b c d

and the string a_b-_-c_d which is tokenized as a b c d

why is there a difference when there is a digit at the beginning? I am using standard unstemmed

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message