lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <gsing...@syr.edu>
Subject Re: Tokenizers and java.text.BreakIterator
Date Tue, 20 Jul 2004 20:27:49 GMT
Answering my own question, I think it is b/c Tokenizer's work with a Reader and you would have
to read in the whole document in order to use the BreakIterator, which operates on a String...

>>> gsingers@syr.edu 07/20/04 03:23PM >>>
Hi,

Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a tokenizer
for various languages?  Does it do a good job?  It seems like it does, at least for languages
where words are separated by spaces or punctuation, but I have only done simple tests.

Anyone have any thoughts on this?  What am I missing?  Does this seem like a valid approach?

Thanks,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org 
For additional commands, e-mail: lucene-user-help@jakarta.apache.org 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message