lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: How to prevent WordDelimiterFilter tokenize the string with underscore?
Date Thu, 16 Jun 2016 05:02:16 GMT
Hi,

You can supply custom types. 
please see WordDelimiterFilterFactory and wdfftypes.txt for an example.

ahmet


On Wednesday, June 15, 2016 10:32 PM, Xiaolong Zheng <zhengxiaolong@gmail.com> wrote:
Hi,

How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.

I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:

flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;


But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.

I am wondering how can I prevent it to tokenize the string with underscores?




Sincerely,

--Xiaolong

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message