lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Fernandez <difer...@redhat.com>
Subject Extending StandardTokenizer Jflex to not split on '/'
Date Fri, 14 Feb 2014 18:42:32 GMT
Hi guys, this is my first time posting on the Lucene list, so hello everyone.

I really like the way that the StandardTokenizer works, however I'd like for it to not split
tokens on / (forward slash).  I've been looking at http://unicode.org/reports/tr29/#Default_Word_Boundaries
to try to understand the rules, but I'm either misunderstanding or missing something.  If
I understand correctly, the symbols in MidLetter keep it from splitting a token as long as
there's alpha chars on either side.  I tried adding the forward slash to the MidLetter and
MidLetterSupp rules (tried different combinations), but it still seems like it's splitting
on it.

Does anyone have any tips or ideas?

Thanks

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message