lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teruhiko Kurosaka" <>
Subject How do TeeTokenizer and SinkTokenizer work?
Date Fri, 22 Aug 2008 19:47:32 GMT
I'm interested in knowing how these tokenizers work together.
The API doc for TeeTokenizer

has this sample code:
SinkTokenizer sink1 = new SinkTokenizer(null);
SinkTokenizer sink2 = new SinkTokenizer(null);

TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader1),
sink1), sink2);
TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader2),
sink1), sink2);

TokenStream final3 = new EntityDetect(sink1);
TokenStream final4 = new URLDetect(sink2);

with an explanation that reads "sink1 and sink2 will both get tokens from both reader1 and
reader2 after whitespace tokenizer",
but I don't understand how the input from reader1 and reader2 are mixed together.
Will sink1 first reaturn the reader1 text, and reader2?
Or are they mixed randomly?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message