commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [compress] ZIP - encoding of file names - again
Date Fri, 13 Feb 2009 16:47:04 GMT
On 2009-02-13, Wolfgang Glas <wolfgang.glas@ev-i.at> wrote:

> Stefan Bodewig schrieb:

>> Reading
>> =======

>> The question is what ZipFile should assume as its default if neither
>> the EFS nor extra fields are present.  This can be controlled by
>> "setEncoding" right now and defaults to the platform's default
>> encoding but a default of UTF-8 (compatible with java.util.zip) or
>> CodePage 437 (compatible with formal ZIP spec) are valid choices as
>> well.

> AFAIKS, ant API user are used to the 'setEncoding(String encoding)'
> approach although it yould be better to rename the method to
> 'setDefaultEncoding(String encoding)'.

Ant is a different concern since Ant has to keep its API backwards
compatible.  commmonscompress is free to break it, and I'm prepared to
do that (and live with more merge conflicts when shipping changes
between compress and Ant).

I agree that setDefaultEncoding would be a better name.

>> Writing
>> =======

>> I propose new flags get/setLanguageEncodingFlag for EFS and
>> get/setAddUnicodeExtraFields on ZipArchiveOutputStream that control
>> whether either approach is used.  I.e. I propose to optionally support
>> either approach (and both at the same time).

> The question at this point is, whther to us the EFS flag for *all* records* or
> only for records not encodable by the encoding set by 'setEncoding(String)'.

> IMHO we should tke over the 7-zip approach and set the EFS flag only for
> not-encodable records, since this approch is mininimally invasive.

Works for me.

>> IMHO the main question is what the code should do by default.

>> Currently I think the best default approach would be to use UTF-8 as
>> the default encoding and set the EFS bit since this will create
>> archives compatible with java.util.zip but has the additional benefit
>> of clearly stating it is using UTF-8.

> Yes, this seems to be reasonable, because users will expect JAVA-compatibility
> in the first instance.

>> Note that using the EFS bit may make the archive unreadable for old
>> archivers, that's why we need the option to turn it off.

> I've not seen an old archiver you refused to unpack such a file.

There is a warning and clearly the general purpose bit will be set to
a value some archivers don't understand.  If they ignore the fact,
that is fine.

> How about my suggestion for a 'tuning' method, sets up the
> ZipOutputStream in a way, that's suitable for most unzip tools out
> in the wild?

I'm not sure whether we want to encode such magic.

> Or sould we gather all the knowledge we gathered in SANDBOX-176 an in this
> thread into the JavaDoc of the class ?

Yes, we should.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message