harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Liang <richard.lian...@gmail.com>
Subject Re: [classlib] charset decoding
Date Fri, 28 Apr 2006 10:25:27 GMT
Paulex Yang wrote:
> Vladimir Strigun wrote:
>> On 4/26/06, Paulex Yang <paulex.yang@gmail.com> wrote:
>>  
>>> Andrew Zhang wrote:
>>>    
>>>> Mikhail Loenko wrote:
>>>>      
>>>>> 2006/4/26, Andrew Zhang <zhanghuangzhu@gmail.com>:
>>>>>        
>>>>>> Vladimir Strigun wrote:
>>>>>>          
>>>>>>> On 4/26/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
>>>>>>> I try to find best solution for Harmony-166 and during fix 
>>>>>>> preparation
>>>>>>> I've found current issue.
>>>>>>>
>>>>>>> Method test_read from tests.api.java.io.InputStreamReaderTest

>>>>>>> failed
>>>>>>> and first fix I've created was with in.available() but I agree
with
>>>>>>> Mikhail that it's possible not the best solution.
>>>>>>>             
>>>>>> why?
>>>>>>
>>>>>> Yes, the spec says "The available method for class InputStream 
>>>>>> always
>>>>>> returns 0.". But it also says "This method should be overridden by
>>>>>> subclasses.". :)
>>>>>>
>>>>>> Here's the available description:
>>>>>> "Returns the number of bytes that can be read (or skipped over) from
>>>>>> this input stream without blocking by the next caller of a method

>>>>>> for
>>>>>> this input stream."
>>>>>>
>>>>>> If someone writes a subclass of InputStream that available returns
0
>>>>>> even if there are some data available, then he should take the
>>>>>> result of
>>>>>> cheating or contradicting with spec.
>>>>>>           
>>>>> What if someone just unable to say if the bytes are available?
>>>>> For example, he reads something from a hardware module and
>>>>> the only way to know whether there are bytes is try reading?
>>>>>
>>>>>         
>>>> I'm a little confused.
>>>>
>>>> How does solution 2 handle this problem?
>>>>       
>>> I agree with Mikhail that available is not reliable.  You can judge the
>>> stream end has been reached if and only if you got -1 returned from 
>>> read
>>> methods.
>>>
>>> I did a quick view of Harmony-166 and patches(seems it's the root of 
>>> why
>>> Harmony-410 is raised), I think there should be some easier way to fix
>>> this bug,  while the CharsetDecoder doesn't need to be modified, pls.
>>> see below for details.
>>>
>>> The fillBuf() of InputStreamReader should be looks like:
>>>
>>>    private void fillBuf() throws IOException {
>>>        chars.clear();
>>>        int read = 0;
>>>       // if we haven't reached the end of stream, and cannot decode
>>> even one character,
>>>      // go on to read and decode
>>>        while(read != -1 && chars.position() == 0){
>>>            try {
>>>                read = in.read(bytes.array());
>>>            } catch (IOException e) {
>>>                chars.limit(0);
>>>                throw e;
>>>            }
>>>            boolean endOfInput = false;
>>>            if(read == -1){
>>>                //if we have reached the end of stream, try the last
>>> time to decode
>>>                bytes.limit(0);
>>>                endOfInput = true;
>>>            }else{
>>>                //if we read some bytes, try to decode them
>>>                bytes.limit(read);
>>>                read = 0;
>>>            }
>>>            decoder.decode(bytes, chars, endOfInput);
>>>            bytes.clear();
>>>        }
>>>        chars.flip();
>>>    }
>>>
>>> I've got a pass to run Vladimir's test case for Harmony-166.
>>>
>>> If no one objections, I'll attach patch based on this. comments?
>>>     
>>
>> Paulex, unfortunately I can't agree with your patch.
>> InputStreamReaderTest passed, but OutputStreamWriterTest still failed.
>> Could you please try to run test I have mentioned?
>>   
> I see, the OutputStreamWriterTest does fail, seems there's some 
> problem for CharsetDecoder to handling UTF-8 byte stream, but the 
> InputStreamReader itself should work because it pass all tests for 
> other decoding. I'll go on to study it.
>
> And I also realized there is a bug in my former proposal - it cannot 
> support the non-blocking InputStream, so the revisited version is as 
> below:
>    private void fillBuf() throws IOException {
>        chars.clear();
>        int read = 0;
>        do{
>            try {
>                read = in.read(bytes.array());
>            } catch (IOException e) {
>                chars.limit(0);
>                throw e;
>            }
>            boolean endOfInput = false;
>            if(read == -1){
>                bytes.limit(0);
>                endOfInput = true;
>            }else{
>                bytes.limit(read);
>            }
>            decoder.decode(bytes, chars, endOfInput);
>            bytes.clear();
>        }while(read > 0 && chars.position() == 0);
> //the main difference with prior version is to check read>0 instead of 
> read != -1, so that the InputStreamReader based on non-blocking 
> InputStream can return immediatelly
>        chars.flip();
>    }
>> Thanks.
>> Vladimir.
>>
>>  
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>>        
>>>>>>> In current implementation of InputStreamReader endOfInput variable
>>>>>>> sets to true if reader can't read less that 8192 bytes. When
>>>>>>> InputStreamReader try to read one character from
>>>>>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only

>>>>>>> returns
>>>>>>> a single byte per read) true as a parameter passed to charset

>>>>>>> decoder,
>>>>>>> nevertheless we still have further input from
>>>>>>> LimitedByteArrayInputStream.
>>>>>>>
>>>>>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also
failed
>>>>>>> because of read() method implementation of InputStreamReader.
>>>>>>>
>>>>>>> IMO, we have 2 ways to fix Harmony-166 :
>>>>>>>
>>>>>>> 1. Don't change the behavior of CharsetDecoder, and use 
>>>>>>> in.available()
>>>>>>> method for endOfInput variable. If in.available() > 0 pass
false to
>>>>>>> decoder, read additional one byte from input stream and try to

>>>>>>> decode
>>>>>>> again. If decoding operation can decode remaining portion + one
>>>>>>> additional byte to character, stop here, else try to read addition
>>>>>>> byte again. Function fillBuf will invoke with the next 
>>>>>>> invocation of
>>>>>>> read() method and next portion of 8192 bytes will be read from

>>>>>>> input
>>>>>>> stream.
>>>>>>>
>>>>>>>             
>>>>>> It's reasonable.
>>>>>> After all, streaming decode is useful and helpful to users.
>>>>>> IMO, streaming decode is a highlight compared with some RI :)
>>>>>>
>>>>>>          
>>>>>>> 2. Change the behavior of CharsetDecoder and operate with
>>>>>>> bytes.remaining() for calculating endOfInput. In this case, we

>>>>>>> always
>>>>>>> pass false with first invocation of decode method. Algorithm

>>>>>>> further
>>>>>>> is almost the same as above, but we stop the cycle if all bytes
>>>>>>> decoded successfully (i.e if bytes.hasRemaining() is false).
>>>>>>>             
>>>> Vladimir, Could you please give the detail code or pseudo code of
>>>> these two solutions?
>>>>
>>>> Thanks!
>>>>
>>>>      
>>>>>>> What do you think about it?
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Vladimir.
>>>>>>>             
>>>>>> Thanks!
>>>>>>           
>>>>
>>>> ---------------------------------------------------------------------
>>>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>>>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>>>
>>>>
>>>>       
>>> -- 
>>> Paulex Yang
>>> China Software Development Lab
>>> IBM
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>>
>>>
>>>     
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>
>>   
>
>
Yes. It IS a bug of ICU4JNI. I have submitted a bug [1] for ICU and have 
proposed a fix for it.

[1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5180

-- 
Richard Liang
China Software Development Lab, IBM 



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Mime
View raw message