commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-429) Expose whether ZIP entry name & comment come from Unicode extra field
Date Wed, 15 Nov 2017 16:25:00 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253732#comment-16253732
] 

Stefan Bodewig commented on COMPRESS-429:
-----------------------------------------

In my experience only WinZip uses the unicode extra field, all others (apart from Windows
Compressed Folders, which doesn't support Unicode at all) have switched to the EFS flag by
now. So maybe you do not want to put too much effort in reading the extra field. In addition
when we look at what WinZip does (COMPRESS-427 and COMPRESS-176) it's hard to say one could
trust its content.

{{hasUnicodeName()}} would be equivalent to {{getExtraField(UnicodePathExtraField.UPATH_ID)
!= null}} and you'd probably want to call {{getExtraField}} if this was true anyway - just
in case the {{ZipFile}} or stream has been constructed with {{useUnicodeExtraFields}} set
to false.

> Expose whether ZIP entry name & comment come from Unicode extra field
> ---------------------------------------------------------------------
>
>                 Key: COMPRESS-429
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-429
>             Project: Commons Compress
>          Issue Type: Improvement
>            Reporter: Damiano Albani
>            Priority: Minor
>              Labels: Unicode, ZIP
>
> It is known fact that detecting the encoding of the name/comment of ZIP entries is a
messy process. And that the general purpose bit 11 is often unreliable.
> Only the so-called Unicode extra field (if present) can be trusted to reliably determine
a ZIP entry name & comment, as far as I understand.
> But the current API of Commons Compress doesn't (easily) expose in which situation the
ZIP archive reader is.
> That's why I propose to add a couple of new getter/setter-exposed fields to {{ZipArchiveEntry}},
e.g.:
> {noformat}
> boolean hasUnicodeName
> boolean hasUnicodeComment
> {noformat}
> This way it can be easily determined if the value returned by {{ZipArchiveEntry::getName}}
or {{ZipArchiveEntry::getComment}} can be trusted. Or if it needs some "character encoding
sniffing" of sorts.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message