commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Gosch (JIRA)" <>
Subject [jira] Created: (SANDBOX-176) Enable creation of tool-readable ZIP archives with file names containing non-ASCII characters
Date Tue, 26 Sep 2006 10:22:56 GMT
Enable creation of tool-readable ZIP archives with file names containing non-ASCII characters

                 Key: SANDBOX-176
             Project: Commons Sandbox
          Issue Type: Improvement
          Components: Compress
         Environment: Any / All
            Reporter: Christian Gosch

Currently it is not possible to generate externally readable ZIP archives with*
or org.apache.commons.compress.* when entries to include shall have names with characters
outside US-ASCII. This should be changed to enable at least org.apache.commons.compress.*
to produce ZIP archives in international context which are readable by usual ZIP archiver
tools like pkzip, gzip, WinZIP, PowerArchiver, WinRAR / rar, StuffIt...

For* this is due to a really old flaw on handling entry names: They are just
always rendered as UTF-8, which is kind of Java specific, and not as Cp437, which is expected
and written by most ZIP archiver tools (or eventually all). For more details see:;:YfiG?bug_id=4244499;:YfiG?bug_id=4820807

For* the "compress & save" operation can be
easily improved by extending ZipArchive:

// Add member:

    protected String m_encoding = null;

// Add constructor:

    public ZipArchive(String encoding) {
        m_encoding = encoding;

// Extend doSave(FileOutputStream):
// ...
		// Pack-Operation
		ZipOutputStream out = null;
		try {
			out = new ZipOutputStream(new BufferedOutputStream(output));
            if (m_encoding != null) {   // added
                out.setEncoding(m_encoding);   // added
            }  // added
			while(iterator.hasNext()) {
// ...

Now it is possible to instantiate a ZipArchive with "Cp437" as encoding, and external tools
can figure out the original entry names even if they contain non-ASCII characters. (On the
other hand, Java cannot read back & deflate such an archive since it expects UTF-8!)

The "read & deflate" operation for ZipArchive is more difficult to extend since it currently
relies completely on* . The other reason is, that ZIP archives do not contain
any hint on the character encoding used for file names etc. It seems that the usual tools
simply use Cp437 and Java simply uses UTF-8 -- without any declaration of reasons. Thus a
deflater has to try.

For TarArchive the problem is unclear. Here the commons-compress implementation does not rely
on third-party code as far as I can see, and TAR is no Java-bound file type (like JAR, which
is Java-bound). Thus chances are, that everything works well, even when entry names with non-ASCII
characters come into play.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message