commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernd Eckenfels (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (VFS-637) Zip files with legacy encoding and special characters let VFS crash
Date Wed, 21 Jun 2017 18:14:00 GMT

    [ https://issues.apache.org/jira/browse/VFS-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057955#comment-16057955
] 

Bernd Eckenfels commented on VFS-637:
-------------------------------------

What do you think about StandardCharSet.ASCII or LATIN1 as default or would you use absent
(which throws for non UTF8 marked archives?)

> Zip files with legacy encoding and special characters let VFS crash
> -------------------------------------------------------------------
>
>                 Key: VFS-637
>                 URL: https://issues.apache.org/jira/browse/VFS-637
>             Project: Commons VFS
>          Issue Type: Bug
>         Environment: Windows 10 64 Bit, Java 8
>            Reporter: Guido Schnepp
>              Labels: easyfix
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Oracle has reworked the ZipFile object with Java 7. Since then the default constructor
used by commons-vfs2 2.1 is more restrictive than with Java 6. The ZipFile constructor has
got a second parameter (Charset) now for specification of the legacy charset to be used explicitly
if the ZipFile doesn't state its UTF-8 compliance internally. This affects all ZIP files using
a legacy charset for filename encoding but not using UTF-8 is it is common today. This could
be a ZIP file with files containing german umlauts or russian characters in archived file's
filenames, for example.
> To support this new parameter with (more or less) default values, the class org.apache.commons.vfs2.provider.zip.ZipFileSystem
has to be extended by a default charset parameter, getter or setter (as you like) to forward
this setting to the java.util.zip.ZipFile constructor.
> Quick workaround for me was to create a new OwnZipFileProvider referring to the even
new OwnZipFileSystem (extending ZipFileSystem) with the following modified function. Change
has been highlighted:
> {{	protected ZipFile createZipFile(final File file) throws FileSystemException {
> 		try {
> 			return new ZipFile(file{color:red}*, Charset.forName("IBM437")*{color});
> 		} catch (final IOException ioe) {
> 			throw new FileSystemException("vfs.provider.zip/open-zip-file.error", file, ioe);
> 		}
> 	}
> }}
> Presetting to charset 437 as legacy default charset seems to be a a good workaround as
stated in appendix D here: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT :
> "D.1 The ZIP format has historically supported only the original IBM PC character encoding
set, commonly referred to as IBM Code Page 437.  This limits storing file name characters
to only those within the original MS-DOS range of values and does not properly support file
names in other character encodings, or  languages. [...]"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message