lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: How to customize the delimiters used by the WordDelimiterFilter in Lucene?
Date Sat, 18 Mar 2017 22:20:05 GMT
Hi,

May be look at the factory class to see how types argument is handled?

Ahmet


On Friday, March 17, 2017 11:05 PM, "phauly@mailbox.org" <phauly@mailbox.org> wrote:



Hi,


I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with Lucene 4.4.0.


Lucene's WordDelimiterFilter should be ideal for this. However, it treats every(?) non-alphanumeric
character as a delimiter. So, terms like 'C++' are transformed to 'C', which is not what I
want.


Apparently, Solr allows to specify custom delimiters. But how can I do it in Lucene?


I have looked into the documentation and the 'byte[] charTypeTable' parameter in the Constructor
looked promising. But it seems to have no effect if I specify some delimiters in a charTypeTable.


Thank you!


---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message