lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valery <khame...@gmail.com>
Subject Re: Any Tokenizator friendly to C++, C#, .NET, etc ?
Date Fri, 21 Aug 2009 12:18:22 GMT


Simon Willnauer wrote:
> 
> I already responded... again...
> 
sorry, I've been in answering and seen your post right after sending.


Simon Willnauer wrote:
> 
> Tokenizer splits the input stream into tokens (Token.java) and
> TokenFilter subclasses operate on those. I expect from a Tokenizer
> that is provides me a stream of tokens :) - how those tokens are
> created is the responsibility of the Tokenizer.

According to your requirements:

 * one programmer will write a simplistic Tokenizer that converts a whole
char input into a 1 huge token. 

 * another programmer will write a simplistic Tokenizer that converts each
single char of the input into a 1-char token.  It will end up in a huge
number of 1-char tokens.

Moreoever, both claim the job is done in a brilliant way, because the
Tokenizer is based on a 1-line statement in Java...

Who did the work better?

Said that, I'd love to hear more specific requirements about Tokenizer to
avoid the above odd deliveries :)

regards
Valery

-- 
View this message in context: http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25078755.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message