lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nilesh Vijaywargiay <nilesh.vi...@gmail.com>
Subject Lucene tokenization
Date Tue, 27 Mar 2012 18:03:30 GMT
I have a string 01a_b-_-c-d which is tokenized as
01a_b
c
d

and the string a_b-_-c_d which is tokenized as
a
b
c
d

why is there a difference when there is a digit at the beginning? I am
using standard unstemmed tokenizer.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message