lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Armins Stepanjans <>
Subject Looking For Tokenizer With Custom Delimeter
Date Mon, 08 Jan 2018 10:21:35 GMT

I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since having the empty
string token is not critical).

Is such functionality built into Lucene (I'm working with 7.1.0) and does
this seem like the correct approach to the problem?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message