lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valery <>
Subject Re: Any Tokenizator friendly to C++, C#, .NET, etc ?
Date Fri, 21 Aug 2009 12:18:22 GMT

Simon Willnauer wrote:
> I already responded... again...
sorry, I've been in answering and seen your post right after sending.

Simon Willnauer wrote:
> Tokenizer splits the input stream into tokens ( and
> TokenFilter subclasses operate on those. I expect from a Tokenizer
> that is provides me a stream of tokens :) - how those tokens are
> created is the responsibility of the Tokenizer.

According to your requirements:

 * one programmer will write a simplistic Tokenizer that converts a whole
char input into a 1 huge token. 

 * another programmer will write a simplistic Tokenizer that converts each
single char of the input into a 1-char token.  It will end up in a huge
number of 1-char tokens.

Moreoever, both claim the job is done in a brilliant way, because the
Tokenizer is based on a 1-line statement in Java...

Who did the work better?

Said that, I'd love to hear more specific requirements about Tokenizer to
avoid the above odd deliveries :)


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message