commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [compress] [PATCH] Refactoring of zip encoding support.
Date Tue, 03 Mar 2009 04:40:15 GMT
On 2009-03-02, Wolfgang Glas <wolfgang.glas@ev-i.at> wrote:

> Stefan Bodewig schrieb:
>> On 2009-03-01, Wolfgang Glas <wolfgang.glas@ev-i.at> wrote:

>>> 1) Unicode extra fields are written for all ZIP entries and not only
>>> for entries, which are not encodable by the encoding set to
>>> ZipArchiveOutputStream.

>> Maybe room for yet another flag?  Or an enum-like option

>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE)

Consider the WinZIP case, WinZIP wouldn't recognize the EFS.  If you
set the encoding to UTF-8 and use your code and only add extra fields
for non-encodable paths, WinZIP will never see the correct path.


> I like the idea of a unicode policy flag ;-)

May be a better approach, agreed.  But only if we manage to cover all
border cases.

> My suggestion is

> setUnicodePolicy(
>   SURROGATES   | /* no extra fields, no utf-8 fallback, only %Uxxxx surrogates*/
>   EXTRA_FIELDS | /* extra fields for unencodable entriey, no utf-8 fallback   */
>   EXTRA_FIELDS_ALWAYS | /* extra fields for all entries, no utf-8 fallback    */
>   UTF8_FALLBACK| /* fall back to utf-8 plus EFS flag for unencodable entries. */
>   UTF8_FALLBACK_EXTRA_FIELDS| /* fall back to utf-8 plus EFS flag plus extra
>                                  fields for unencodable */
>   UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS /* fall back to utf-8 plus EFS flag for
>                                        unencodable entries, exta fields for all
>                                        entries. */
> )

> We might drop the last two options and we might choose a better
> wording, however the direction should IMHO be as above mentioned...

This covers all permutations, agreed.

Names, names, I'm really bad at them.

EXTRA_FIELDS                      => ADD_EXTRA_FIELDS_FOR_UNENCODABLE
EXTRA_FIELDS_ALWAYS               => ADD_EXTRA_FIELDS
UTF8_FALLBACK                     => FALL_BACK_TO_UTF8
UTF8_FALLBACK_EXTRA_FIELDS        => FALL_BACK_TO_UTF8_PLUS_EXTRA_FIELD
UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS => FALL_BACK_TO_UTF8_ADD_EXTRA_FIELDS

but looking at the names we may be better off with two independent
options.  Hmm, yes, right now I prefer two flags because they seem to
be orthogonal.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message