lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teruhiko Kurosaka" <K...@basistech.com>
Subject How do TeeTokenizer and SinkTokenizer work?
Date Fri, 22 Aug 2008 19:47:32 GMT
Hello,
I'm interested in knowing how these tokenizers work together.
The API doc for TeeTokenizer
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/analysis/TeeTokenFilter.html

has this sample code:
SinkTokenizer sink1 = new SinkTokenizer(null);
SinkTokenizer sink2 = new SinkTokenizer(null);

TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader1),
sink1), sink2);
TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader2),
sink1), sink2);

TokenStream final3 = new EntityDetect(sink1);
TokenStream final4 = new URLDetect(sink2);

with an explanation that reads "sink1 and sink2 will both get tokens from both reader1 and
reader2 after whitespace tokenizer",
but I don't understand how the input from reader1 and reader2 are mixed together.
Will sink1 first reaturn the reader1 text, and reader2?
Or are they mixed randomly?

-Kuro
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message