lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory
Date Sat, 25 Nov 2017 04:32:49 GMT
Hi Rick,

For both of the tokenizers, it does not split on the hyphens for email like
this:
solr-user@lucene.apache.org

The entire email address remains intact for both of the tokenizers.

Regards,
Edwin

On 24 November 2017 at 20:19, Rick Leir <rleir@leirtech.com> wrote:

> Edwin
> There is a spec for which characters are acceptable in an email name, and
> another spec for chars in a domain name. I suspect you will have more
> success with a tokenizer which is specialized for email, but I have not
> looked at UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory split
> on hyphens?
> Cheers --Rick
>
> On November 24, 2017 3:46:46 AM EST, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com> wrote:
> >Hi,
> >
> >I am indexing email addresses into Solr via EML files. Currently, I am
> >using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I
> >also
> >found that we can also use UAX29URLEmailTokenizerFactory with
> >LowerCaseFilterFactory.
> >
> >Does anyone have any recommendation on which Tokenizer is better?
> >
> >I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.
> >
> >Regards,
> >Edwin
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message