commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julius Davies (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CODEC-91) Handling of embedded padding in base64 encoded data changed in 1.4
Date Wed, 02 Jun 2010 21:39:41 GMT

    [ https://issues.apache.org/jira/browse/CODEC-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874811#action_12874811
] 

Julius Davies edited comment on CODEC-91 at 6/2/10 5:39 PM:
------------------------------------------------------------

I carefully analyzed the codec-1.3 logic.  It doesn't quite handle this scenario correctly.
 There's a bug.  Result of running the test case originally supplied by Chris Pimlott against
codec-1.3 is the following:

{code}
byte[] result = b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

// result now contains:
{'c', 'o', 'm', 'm', 'o', 'n', 's', NULL, 'c', 'o', 'm', 'm', 'o', 'n', 's'}
{code}

That NULL shouldn't be there (actually a zero since it's a primitive, but you know what I
mean), so the decode in the commons-codec-1.3 logic is actually causing a corruption of the
data in this scenario.

----------------------
So what should we do for 1.4.X?  Here's what's on the table at the moment:

Option 0:  Treat first PAD as EOF and stop all processing (current 1.4 behaviour).

Option 1:  Restart decode state when PAD is encountered in any location.

Option 2:  Only restart decode state if PAD's are correctly placed in the stream (ie. AB==,
ABC=).  This is closest to the codec-1.3.jar behaviour, but without the NULL bug.


Personally I'm +1 to option 1, for the following reasons:

- Probably easier to program.  The implementation doesn't really need to know much about the
PAD when it encounters one (ie. no need to ask is this a good PAD or a bad PAD?).

- Correctly decodes all Base64 inputs where PADs are in the correct place, so in a way we're
already implementing option 2 here.

- Also properly decodes the invalid input "AB=AB=" if stream corruption happened because of:
s/==/=/g 

- For all other inputs it's garbage-in garbage-out





      was (Author: juliusdavies):
    I carefully analyzed the codec-1.3 logic.  It doesn't quite handle this scenario correctly.
 There's a bug.  Result of running the test case originally supplied by Chris Pimlott against
codec-1.3 is the following:

{code}
byte[] result = b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

// result now contains:
{'c', 'o', 'm', 'm', 'o', 'n', 's', NULL, 'c', 'o', 'm', 'm', 'o', 'n', 's'}

That NULL shouldn't be there (actually a zero since it's a primitive, but you know what I
mean), so the decode in the commons-codec-1.3 logic is actually causing a corruption of the
data in this scenario.

----------------------
So what should we do for 1.4.X?  Here's what's on the table at the moment:

Option 0:  Treat first PAD as EOF and stop all processing (current 1.4 behaviour).

Option 1:  Restart decode state when PAD is encountered in any location.

Option 2:  Only restart decode state if PAD's are correctly placed in the stream (ie. AB==,
ABC=).  This is closest to the codec-1.3.jar behaviour, but without the NULL bug.


Personally I'm +1 to option 1, for the following reasons:

- Probably easier to program.  The implementation doesn't really need to know much about the
PAD when it encounters one (ie. no need to ask is this a good PAD or a bad PAD?).

- Correctly decodes all Base64 inputs where PADs are in the correct place, so in a way we're
already implementing option 2 here.

- Also properly decodes the invalid input "AB=AB=" if stream corruption happened because of:
s/==/=/g 

- For all other inputs it's garbage-in garbage-out




  
> Handling of embedded padding in base64 encoded data changed in 1.4
> ------------------------------------------------------------------
>
>                 Key: CODEC-91
>                 URL: https://issues.apache.org/jira/browse/CODEC-91
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Chris Pimlott
>         Attachments: codec-91-actually-works-and-tests-yay.patch
>
>
> 1.4 changed the way that embedded padding characters (i.e. "=") were handled when decoding
base64 data.  Previously, the decoder ignored them and decoded all the data.  Now it stops
upon encountering the first padding byte.  This breaks compatibility with previous versions.
> For example, in 1.4,
> b64.decode("Y29tbW9ucwo=".getBytes());
> gives the same result as
> b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message