lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob Elder <>
Subject Good example of multiple tokenizers for a single field
Date Mon, 29 Nov 2010 22:15:26 GMT
I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single "body" field which until
recently was all latin characters, but we're now encountering both English
and Japanese words in a single message. Obviously, we need to be using CJK
in addition to WhitespaceTokenizerFactory.

I've found some references to using copyFields or NGrams but I can't quite
grasp what the whole solution would look like.

Jacob Elder
(646) 535-3379

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message