harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Liang <richard.lian...@gmail.com>
Subject Re: [jira] Created: (HARMONY-62) java.text.BreakIterator.getSentenceInstance().next() treats '\n' as the end of the sentence
Date Tue, 21 Feb 2006 08:50:59 GMT
Dear Tatyana,

As you may know, our (Harmony) implementation just wraps ICU4J's 
BreakIterator. And the rules of ICU4J's BreakIterator is compliant with 
Unicode TR29 which is different with the rules of RI.

This is a common issue for most of the classes in "text". If we want 
implementation to have the same behavior as RI, we should get the rules 
of RI. However, I think the rules must be controlled by some kinds of 
license. So a better solution may be wrapping icu4j's implementation for 
all text (internationalization) classes. As I know, ICU4J is special for 
i18n.

Any comments? Thanks a lot.

Please refer to ICU's homepage: http://icu.sourceforge.net/

Richard Liang
China Software Development Lab, IBM



tatyana doubtsova (JIRA) wrote:
> java.text.BreakIterator.getSentenceInstance().next() treats '\n' as the end of the sentence
> -------------------------------------------------------------------------------------------
>
>          Key: HARMONY-62
>          URL: http://issues.apache.org/jira/browse/HARMONY-62
>      Project: Harmony
>         Type: Bug
>   Components: Classlib  
>     Reporter: tatyana doubtsova
>
>
> Problem details:
> java.text.BreakIterator.getSentenceInstance().next() stops searching for the sentence
end, if the new-line character is found in the text and returns the index of the last seen
non white space character. Due to j2se 1.4.2 method next() should return the boundary following
the current boundary.
>
> Code for reproducing Test.java:
> import java.text.BreakIterator;
> public class Test {
>     public static void main(String [] args)
>     {
>     	BreakIterator it = BreakIterator.getSentenceInstance();
>     	it.setText("One sentence \n on two lines.");
>     	System.out.println(it.next());
>     }
> }
>
> Steps to Reproduce:
> 1. Build Harmony (check-out on 2006-01-30) j2se subset as described in README.txt.
> 2. Compile Test.java using BEA 1.4 javac
>   
>> javac -d . Test.java
>>     
> 3. Run java using compatible VM (J9)
>   
>> java -showversion Test
>>     
>
> Output:
> java version 1.4.2 (subset)
> (c) Copyright 1991, 2005 The Apache Software Foundation or its licensors, as applicable.
> 14
>
> Output on BEA 1.4.2 to compare with:
> 28
>
> Suggested junit test case:
>
> package org.apache.harmony.tests.java.text;
>
> import java.text.BreakIterator;
> import java.util.Locale;
>
> import junit.framework.TestCase;
>
> public class BreakIteratorTest extends TestCase {
>
> 	public void test_next() {
> 		// Regression test for HARMONY-30
> 		BreakIterator bi = BreakIterator.getWordInstance(Locale.US);
> 		bi.setText("This is the test, WordInstance");
> 		int n = bi.first();
> 		n = bi.next();
> 		assertEquals("Assert 0: next() returns incorrect value ", 4, n); 
>
> 		// Regression test for the current issue
> 	   	bi = BreakIterator.getSentenceInstance();
>     		bi.setText("One sentence \n on two lines.");
> 		n = bi.next();
> 		assertEquals("Assert 1: next() returns incorrect value ", 28, n);
> 	}
> }
>
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message