Mailing-List: contact issues-help@commons.apache.org; run by ezmlm
Precedence: bulk
Reply-To: issues@commons.apache.org
Date: Mon, 22 Apr 2013 13:57:16 +0000 (UTC)
From: "Sebb (JIRA)" <jira@apache.org>
To: issues@commons.apache.org
Message-ID: <JIRA.12614986.1352219334479.201619.1366639036101@arcas>
In-Reply-To: <JIRA.12614986.1352219334479@arcas>
References: <JIRA.12614986.1352219334479@arcas>
Subject: [jira] [Comment Edited] (IO-356) CharSequenceInputStream#reset()
 behaves incorrectly in case when buffer size is not dividable by data size
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/IO-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637676#comment-13637676 ] 

Sebb edited comment on IO-356 at 4/22/13 1:56 PM:
--------------------------------------------------

testIO_356 is also broken if readFirst > 0.
That's because the initial read fills the byte buffer.
The mark therefore saves the position after the first n chars have been read from the input.
data1 gets the initial buffer load; data2 gets the next n chars.

[later]
I now think the test does make sense.
Even though the individual bytes may be part of a multi-byte character, if the class is to support mark, it ought to do so as if it held plain bytes. If the mark is placed mid-character encoding, the returned bytes might not make much sense, but that's a problem for the application.

For some cases, it would be possible to support mark/reset purely by adjusting the byte buffer pointers.
However, if the byte buffer has been refilled, that won't work, and it becomes necessary to regenerate the byte buffer contents afresh.
One way to do this would be to keep track of the of where the char buffer was just before the byte buffer was filled, as well as keeping track of the position in the byte buffer. In theory reset can then just re-encode the char buffer and update the byte buffer pointer.
There may need to be some special processing at the start of the encoding.
                
      was (Author: sebb@apache.org):
    testIO_356 is also broken if readFirst > 0.
That's because the initial read fills the byte buffer.
The mark therefore saves the position after the first n chars have been read from the input.
data1 gets the initial buffer load; data2 gets the next n chars.
I'm not sure what the purpose of readFirst is. 
Anyway it makes little sense to read single bytes from an encoding that generates multiple bytes per char.
                  
> CharSequenceInputStream#reset() behaves incorrectly in case when buffer size is not dividable by data size
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: IO-356
>                 URL: https://issues.apache.org/jira/browse/IO-356
>             Project: Commons IO
>          Issue Type: Bug
>          Components: Streams/Writers
>    Affects Versions: 2.4
>            Reporter: Dmitry Katsubo
>         Attachments: CharSequenceInputStreamTest.java
>
>
> The size effect happens when buffer size of input stream is not dividable by requested data size. The bug is hidden in {{CharSequenceInputStream#reset()}} method which should also call (I think) {{bbuf.limit(0)}} otherwise next call to {{CharSequenceInputStream#read()}} will return the remaining tail which {{bbuf}} has accumulated.
> In the attached test case the test fails, if {{dataSize = 13}} (not dividable by 10) and runs OK if {{dataSize = 20}} (dividable by 10).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira