Return-Path: Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: (qmail 86519 invoked from network); 15 Mar 2010 13:55:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Mar 2010 13:55:35 -0000 Received: (qmail 19949 invoked by uid 500); 15 Mar 2010 13:54:48 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 19843 invoked by uid 500); 15 Mar 2010 13:54:48 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 19835 invoked by uid 99); 15 Mar 2010 13:54:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Mar 2010 13:54:48 +0000 X-ASF-Spam-Status: No, hits=-1013.0 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Mar 2010 13:54:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 228DA234C1EE for ; Mon, 15 Mar 2010 13:54:27 +0000 (UTC) Message-ID: <992836929.265431268661267127.JavaMail.jira@brutus.apache.org> Date: Mon, 15 Mar 2010 13:54:27 +0000 (UTC) From: "Stefan Bodewig (JIRA)" To: issues@commons.apache.org Subject: [jira] Commented: (COMPRESS-103) allow data descriptors to follow STORED entries in ZIP archives being read In-Reply-To: <1510396244.265281268660667276.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COMPRESS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845310#action_12845310 ] Stefan Bodewig commented on COMPRESS-103: ----------------------------------------- This is what the InfoZIP appnote.iz has to say {quote} Bit 3: If this bit is set, the fields crc-32, compressed size and uncompressed size are set to zero in the local header. The correct values are put in the data descriptor immediately following the compressed data. (Note: PKZIP version 2.04g for DOS only recognizes this bit for method 8 compression, newer versions of PKZIP recognize this bit for any compression method.) [Info-ZIP note: This bit was introduced by PKZIP 2.04 for DOS. In general, this feature can only be reliably used together with compression methods that allow intrinsic detection of the "end-of-compressed-data" condition. From the set of compression methods described in this Zip archive specification, only "deflate" and "bzip2" fulfill this requirement. Especially, the method STORED does not work! The Info-ZIP tools recognize this bit regardless of the compression method; but, they rely on correctly set "compressed size" information in the central directory entry.] {quote} so ZipFile uses the same approach as the InfoZIP tools. If we were to take the same approach in ZipArchiveInputStream we'd have to consume the whole stream and store its content for future use once we hit a STORED entry that uses the data descriptor. > allow data descriptors to follow STORED entries in ZIP archives being read > -------------------------------------------------------------------------- > > Key: COMPRESS-103 > URL: https://issues.apache.org/jira/browse/COMPRESS-103 > Project: Commons Compress > Issue Type: New Feature > Affects Versions: 1.0 > Reporter: Stefan Bodewig > Priority: Minor > > the document named "Word XPS.xps" found under http://www.wssdemo.com/XPS/Forms/AllItems.aspx contains at least one STORED entry that uses a data descriptor after the entries' data to hold size and CRC information. > The ZipFile class uses information from the central directory and thus knows the size of the entry and can deal with the archive. ZipArchiveInputStream currently can't. > One solution would be to read the entry until we hit the signature of a data descriptor, local file header or the start of the central directory. If we hit another LFH or the CD then the data descriptor didn't use the signature (see COMPRESS-101 ) and the last 12 bytes read have already been the data descriptor. This will certainly not be very efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.