lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory
Date Sat, 25 Nov 2017 02:23:17 GMT


Hi Zheng,

UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps them single token.

StandardTokenizer produce two or more tokens for an entity.

Please try them using the analysis page, use which one suits your requirements.

Ahmet



On Friday, November 24, 2017, 11:46:57 AM GMT+3, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
wrote: 





Hi,

I am indexing email addresses into Solr via EML files. Currently, I am
using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
found that we can also use UAX29URLEmailTokenizerFactory with
LowerCaseFilterFactory.

Does anyone have any recommendation on which Tokenizer is better?

I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.

Regards,
Edwin

Mime
View raw message