> Any thoughts?
best idea I have would be to tokenize with ICUTokenizer, which will
tag emoji sequences as "<EMOJI>" token type, then use
ConditionalTokenFilter to send all tokens EXCEPT those with token type
of "<EMOJI>" to your WordDelimiterFilter. This way
WordDelimiterFilter never sees the emoji at all and can't screw them
up.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|