harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Art - Arthit Suriyawongkul" <art...@gmail.com>
Subject Re: [jira] Created: (HARMONY-62) java.text.BreakIterator.getSentenceInstance().next() treats '\n' as the end of the sentence
Date Tue, 21 Feb 2006 09:23:49 GMT
> As you may know, our (Harmony) implementation just wraps ICU4J's
> BreakIterator. And the rules of ICU4J's BreakIterator is compliant with
> Unicode TR29 which is different with the rules of RI.
>
> This is a common issue for most of the classes in "text". If we want
> implementation to have the same behavior as RI, we should get the rules
> of RI. However, I think the rules must be controlled by some kinds of
> license. So a better solution may be wrapping icu4j's implementation for
> all text (internationalization) classes. As I know, ICU4J is special for
> i18n.

Imho, I don't think that different BreakIterator implementations have
to produce exactly the result ("boundary analysis").

What I meant is, the "Behavior" of them should be all the same,
conform to what described in the Java API doc
  http://java.sun.com/j2se/1.5.0/docs/api/java/text/BreakIterator.html

 Line boundary analysis determines where ...
 Sentence boundary analysis allows ...
 Word boundary analysis is ...
 Character boundary analysis ...

But their result, the "Boundary Analysis", need not to be the same,
just depends on how good each implementation could perform.

That's my opinion.

cheers,
Art

--
:: Art / Arthit Suriyawongkul
:: Applied Computational Linguistics Lab, Uni Potsdam
:: http://www.ling.uni-potsdam.de/acl-lab/
:: http://bact.blogspot.com/

**  Impeach Thaksin   http://tuthaprajan.org
Mime
View raw message