lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhavin Pandya" <bhav...@rediff.co.in>
Subject How to tokenize with comma in standard tokenizer
Date Mon, 17 Sep 2007 14:55:48 GMT
Hi,

Standard tokenizer works pretty well for me... but i found one problem with my usage...

I want to tokenize..."TheRing6,Proposal6,GuyandGirl6" as a three saparate tokens.. while standard
analyzer considering it as a one word because it has one digit in token.

Expected three tokens:
1. thering6
2. proposal6
3. guyandgirl6

i want to change this behaviour of standard tokenizer for this purpose.... But i dont know
where to change....
Do i need to comment some rule in StandardTokenizer.jj file ???  I am confused with this file....

Any pointer...

- Bhavin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message