commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.
Date Sat, 24 Mar 2012 05:39:22 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237447#comment-13237447
] 

Stefan Bodewig commented on COMPRESS-183:
-----------------------------------------

I need to add comments and want to fix handling of linkName for tar entries that represent
links but in general the code should be fixed with svn revision 1304709

The tar package now uses the platform's native encoding by default (this may change to ISO-8859-1
before the release).  Encoding can be overridden inside the constructor.

The outputstream has an additional option that can be used to tell it to write non-ASCII file
names to PAX extension headers, this should work for any modern implemenation of tar and is
the only way to get portable archives - at the expense of an additional 512 bytes block.

The input stream will read and apply PAX extension headers transparently.
                
> Support for de/encoding of tar entry names other than plain 8BIT conversion.
> ----------------------------------------------------------------------------
>
>                 Key: COMPRESS-183
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-183
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>    Affects Versions: 1.3
>            Reporter: Joao Schim
>              Labels: patch
>             Fix For: 1.4
>
>         Attachments: patch-tar-name-encoding.diff, patch-tar-name-encoding.diff, patch-tar-name-encoding.diff
>
>
> The names of tar entries are currently encoded/decoded by means of plain 8bit conversions
of byte to char and vice-versa. This prohibits the use of encodings like UTF8 in the file
names. Whether the use of UTF8 (or any other non ASCII) in file names is sensible is a chapter
of its own. However tar archives that contain files which names have been encoded with UTF8
do float around. These files currently can not be read correctly by commons-compress due to
the encoding being hardcoded to plain 8BIT only. 
> The supplied patch allows to use encodings other than 8BIT using a TarArchiveCodec structure.
It does not change the standard functionality, but adds to it the possibility of using a different
encoding. 
> A method was added to the TarUtilsTest junit test to test the added functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message