Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AC021200D3A for ; Wed, 15 Nov 2017 17:25:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AA1CB160BF4; Wed, 15 Nov 2017 16:25:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EFB3F160BEA for ; Wed, 15 Nov 2017 17:25:04 +0100 (CET) Received: (qmail 87201 invoked by uid 500); 15 Nov 2017 16:25:04 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 87190 invoked by uid 99); 15 Nov 2017 16:25:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Nov 2017 16:25:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2A73C1A0EA3 for ; Wed, 15 Nov 2017 16:25:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id yylE47Jp00hI for ; Wed, 15 Nov 2017 16:25:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D79985FC43 for ; Wed, 15 Nov 2017 16:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4ED6DE0D1F for ; Wed, 15 Nov 2017 16:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B716E240DE for ; Wed, 15 Nov 2017 16:25:00 +0000 (UTC) Date: Wed, 15 Nov 2017 16:25:00 +0000 (UTC) From: "Stefan Bodewig (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COMPRESS-429) Expose whether ZIP entry name & comment come from Unicode extra field MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 15 Nov 2017 16:25:05 -0000 [ https://issues.apache.org/jira/browse/COMPRESS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253732#comment-16253732 ] Stefan Bodewig commented on COMPRESS-429: ----------------------------------------- In my experience only WinZip uses the unicode extra field, all others (apart from Windows Compressed Folders, which doesn't support Unicode at all) have switched to the EFS flag by now. So maybe you do not want to put too much effort in reading the extra field. In addition when we look at what WinZip does (COMPRESS-427 and COMPRESS-176) it's hard to say one could trust its content. {{hasUnicodeName()}} would be equivalent to {{getExtraField(UnicodePathExtraField.UPATH_ID) != null}} and you'd probably want to call {{getExtraField}} if this was true anyway - just in case the {{ZipFile}} or stream has been constructed with {{useUnicodeExtraFields}} set to false. > Expose whether ZIP entry name & comment come from Unicode extra field > --------------------------------------------------------------------- > > Key: COMPRESS-429 > URL: https://issues.apache.org/jira/browse/COMPRESS-429 > Project: Commons Compress > Issue Type: Improvement > Reporter: Damiano Albani > Priority: Minor > Labels: Unicode, ZIP > > It is known fact that detecting the encoding of the name/comment of ZIP entries is a messy process. And that the general purpose bit 11 is often unreliable. > Only the so-called Unicode extra field (if present) can be trusted to reliably determine a ZIP entry name & comment, as far as I understand. > But the current API of Commons Compress doesn't (easily) expose in which situation the ZIP archive reader is. > That's why I propose to add a couple of new getter/setter-exposed fields to {{ZipArchiveEntry}}, e.g.: > {noformat} > boolean hasUnicodeName > boolean hasUnicodeComment > {noformat} > This way it can be easily determined if the value returned by {{ZipArchiveEntry::getName}} or {{ZipArchiveEntry::getComment}} can be trusted. Or if it needs some "character encoding sniffing" of sorts. > What do you think? -- This message was sent by Atlassian JIRA (v6.4.14#64029)