From users-return-11183-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Sat Sep 8 11:12:35 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B9ABC180621 for ; Sat, 8 Sep 2018 11:12:34 +0200 (CEST) Received: (qmail 52911 invoked by uid 500); 8 Sep 2018 09:12:33 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 52900 invoked by uid 99); 8 Sep 2018 09:12:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Sep 2018 09:12:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7FA59C07EB for ; Sat, 8 Sep 2018 09:12:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.002 X-Spam-Level: * X-Spam-Status: No, score=1.002 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 3ShbeUacNtnl for ; Sat, 8 Sep 2018 09:12:31 +0000 (UTC) Received: from mailout02.t-online.de (mailout02.t-online.de [194.25.134.17]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D106E5F341 for ; Sat, 8 Sep 2018 09:12:30 +0000 (UTC) Received: from fwd19.aul.t-online.de (fwd19.aul.t-online.de [172.20.27.65]) by mailout02.t-online.de (Postfix) with SMTP id 6F28741BF0F6 for ; Sat, 8 Sep 2018 11:12:24 +0200 (CEST) Received: from [192.168.2.108] (rAaeiwZSrhA0xwKJANNG0a5kRLeIBnuf42bwNBQapYJ-XRYE7kCfpXUmmA67ghYQBa@[217.231.134.168]) by fwd19.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1fyZHs-3ReAuu0; Sat, 8 Sep 2018 11:12:16 +0200 Subject: Re: Potential problem with PDType1Font.encode To: users@pdfbox.apache.org References: From: Tilman Hausherr Message-ID: Date: Sat, 8 Sep 2018 11:12:16 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-ID: rAaeiwZSrhA0xwKJANNG0a5kRLeIBnuf42bwNBQapYJ-XRYE7kCfpXUmmA67ghYQBa X-TOI-MSGID: 5fd4d6f9-6726-4dae-8c9b-4c1d1cf14d42 Thank you, this is really disturbing :-( I'll investigate that. Tilman Am 07.09.2018 um 15:39 schrieb Daniel Wildschut: > Hello, we use PDFBox to fill in PDF Forms and stumbled on a potential > bug while sanitizing the input. > > We call PDFont.encode to check beforehand if a given character can be > inserted using the given font. > > However we noticed that the results of the method call can change > depending on what other strings have been checked before. > > Apparently PDType1Font stores previous results in a codeToBytesMap, > which then causes the unexpected behavior. > > I'd say that the key used in "codeToBytesMap.put(code, bytes);" is > wrong; you probably want to use the method parameter "unicode" instead. > > I tested 2.0.11, the current 2.0.x branch and the 3.0.x branch and was > able to reproduce the problem with all of them. > > > Code to reproduce: > > public class PDFBoxEncodeTest > { >     public static void main( final String[] args ) >     { >         final PDType1Font font = PDType1Font.HELVETICA_BOLD; >         tryEncode(font, "\u0080"); >         tryEncode(font, "€"); >         tryEncode(font, "\u0080"); >     } > >     private static void tryEncode(final PDFont font, final String str) { >         try { >             font.encode(str); >             System.out.println("Character " + str.codePointAt(0) + " > can be encoded in Font " + font); >         } catch (final IOException | IllegalArgumentException e) { >             System.out.println("Character " + str.codePointAt(0) + " > cannot be encoded in Font " + font + ": " + e.getMessage()); >         } >     } > } > > > Expected output: > > Character 128 cannot be encoded in Font PDType1Font Helvetica-Bold: > U+0080 ('.notdef') is not available in this font Helvetica-Bold > encoding: WinAnsiEncoding > Character 8364 can be encoded in Font PDType1Font Helvetica-Bold > Character 128 cannot be encoded in Font PDType1Font Helvetica-Bold: > U+0080 ('.notdef') is not available in this font Helvetica-Bold > encoding: WinAnsiEncoding > > Actual output: > > Character 128 cannot be encoded in Font PDType1Font Helvetica-Bold: > U+0080 ('.notdef') is not available in this font Helvetica-Bold > encoding: WinAnsiEncoding > Character 8364 can be encoded in Font PDType1Font Helvetica-Bold > Character 128 can be encoded in Font PDType1Font Helvetica-Bold > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org