lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Email id tokenizer (actual email id & multiple terms)
Date Thu, 22 Dec 2016 00:47:07 GMT
On Wed, Dec 21, 2016 at 11:23 PM, suriya prakash <suriya3x@gmail.com> wrote:
> Hi,
>
> Thanks for your reply.
>
> I might have one or more emailds in a single record.

Just so you know, you can add the same field more than once with the
field analysed by KeywordAnalyzer, and it will still become multiple
tokens. This is safer than something like WhitespaceAnalyzer, because
email addresses can actually contain spaces. (UAX29URLEmailAnalyzer
might do the right thing though.)

But if you're doing this in the main text content field,
TeeSinkTokenFilter does seem like the right thing to use. (I have
never found a use for it myself.)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message