From dev-return-112842-apmail-commons-dev-archive=commons.apache.org@commons.apache.org Tue Mar 03 08:12:40 2009 Return-Path: Delivered-To: apmail-commons-dev-archive@www.apache.org Received: (qmail 88536 invoked from network); 3 Mar 2009 08:12:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Mar 2009 08:12:39 -0000 Received: (qmail 86344 invoked by uid 500); 3 Mar 2009 08:12:38 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 86256 invoked by uid 500); 3 Mar 2009 08:12:38 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 86245 invoked by uid 99); 3 Mar 2009 08:12:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2009 00:12:38 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wolfgang.glas@ev-i.at designates 85.125.54.67 as permitted sender) Received: from [85.125.54.67] (HELO mail.ev-i.at) (85.125.54.67) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2009 08:12:30 +0000 Received: from [192.168.17.105] (fw.intra.ev-i.at [192.168.17.254]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mail.ev-i.at (Postfix) with ESMTP id 3400E9BD for ; Tue, 3 Mar 2009 09:12:09 +0100 (CET) Message-ID: <49ACE658.7040901@ev-i.at> Date: Tue, 03 Mar 2009 09:12:08 +0100 From: Wolfgang Glas User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: Commons Developers List Subject: Re: [compress] [PATCH] Refactoring of zip encoding support. References: <49AAFDC5.80704@ev-i.at> <49AC27F9.6060007@ev-i.at> In-Reply-To: X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Stefan Bodewig schrieb: > On 2009-03-02, Wolfgang Glas wrote: > >> Stefan Bodewig schrieb: >>> On 2009-03-01, Wolfgang Glas wrote: > >>>> 1) Unicode extra fields are written for all ZIP entries and not only >>>> for entries, which are not encodable by the encoding set to >>>> ZipArchiveOutputStream. > >>> Maybe room for yet another flag? Or an enum-like option > >>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE) > > Consider the WinZIP case, WinZIP wouldn't recognize the EFS. If you > set the encoding to UTF-8 and use your code and only add extra fields > for non-encodable paths, WinZIP will never see the correct path. Acccording to my tests WinZip recognizes the EFS flag upon reading. Upon writing WinZip uses extra fields and encodes filenames as Cp437, which is really the most useful variant these days. Secondly, if you set the encoding to UTF-8, there's no need for unicode extra fields anyway. But as mentioned above, the most portable tool-readable variant as requested by the reporter of the original SANDBOX-176 issue is writing Cp437 and adding unicode extra fields. EFS support in the wild is not really widespread, propably due to a mid-air collision between specification writing and omplementation of widespread ZIP-Implementations.... >> I like the idea of a unicode policy flag ;-) > > May be a better approach, agreed. But only if we manage to cover all > border cases. > >> My suggestion is > >> setUnicodePolicy( >> SURROGATES | /* no extra fields, no utf-8 fallback, only %Uxxxx surrogates*/ >> EXTRA_FIELDS | /* extra fields for unencodable entriey, no utf-8 fallback */ >> EXTRA_FIELDS_ALWAYS | /* extra fields for all entries, no utf-8 fallback */ >> UTF8_FALLBACK| /* fall back to utf-8 plus EFS flag for unencodable entries. */ >> UTF8_FALLBACK_EXTRA_FIELDS| /* fall back to utf-8 plus EFS flag plus extra >> fields for unencodable */ >> UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS /* fall back to utf-8 plus EFS flag for >> unencodable entries, exta fields for all >> entries. */ >> ) > >> We might drop the last two options and we might choose a better >> wording, however the direction should IMHO be as above mentioned... > > This covers all permutations, agreed. > > Names, names, I'm really bad at them. > > EXTRA_FIELDS => ADD_EXTRA_FIELDS_FOR_UNENCODABLE > EXTRA_FIELDS_ALWAYS => ADD_EXTRA_FIELDS > UTF8_FALLBACK => FALL_BACK_TO_UTF8 > UTF8_FALLBACK_EXTRA_FIELDS => FALL_BACK_TO_UTF8_PLUS_EXTRA_FIELD > UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS => FALL_BACK_TO_UTF8_ADD_EXTRA_FIELDS > > but looking at the names we may be better off with two independent > options. Hmm, yes, right now I prefer two flags because they seem to > be orthogonal. I think you should choose, which approach better fits your needs in ant ;-) At least you have to write an XML parser for these settings and the documentation, so you might choose the approach which may be explained in brief words. I can live very well with two options ;-) Wolfgang --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org