Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5D2C2200C3A for ; Fri, 31 Mar 2017 13:16:53 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5B9C3160B8C; Fri, 31 Mar 2017 11:16:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A083B160B80 for ; Fri, 31 Mar 2017 13:16:52 +0200 (CEST) Received: (qmail 65658 invoked by uid 500); 31 Mar 2017 11:16:51 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 65646 invoked by uid 99); 31 Mar 2017 11:16:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Mar 2017 11:16:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A59A71A01CA for ; Fri, 31 Mar 2017 11:16:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=inmanta.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id tFmjnKB7W92o for ; Fri, 31 Mar 2017 11:16:49 +0000 (UTC) Received: from mail-vk0-f49.google.com (mail-vk0-f49.google.com [209.85.213.49]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B11B05FAD1 for ; Fri, 31 Mar 2017 11:16:48 +0000 (UTC) Received: by mail-vk0-f49.google.com with SMTP id d188so86391953vka.0 for ; Fri, 31 Mar 2017 04:16:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inmanta.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=V5f+83ZvtHM8DUSqur+YxxGGHl/jHn5Dq43Tn1WuwMo=; b=E2ZTkmtmfywoXEII3hXLvC22P70QjMEH6+sxFKI/THOT0X+ERQb+EPmBQLbyF6vM5J 6SEIttNSBMU1CSmyFHdNB6HYIqMTbxkgAb9oMjwkBzzipa1YTCKrbg2vNFHVK6jooy+y l1b0b4g6mbCxAqwkPxd3CaKQRAL1s9mbgLibo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=V5f+83ZvtHM8DUSqur+YxxGGHl/jHn5Dq43Tn1WuwMo=; b=Jh6dnjMTRO18YRbgdAjYCRmANbhg/2ezzua1q66gP8rm+ee0IIQZhzD+zu7bF/L6id 8QAaXHg7MClUVJDd/wRNbgEqe4ogdFzIWiDXxQ4I0sX3F/0OmsTpi7+Mzt7m/xojCAzw i/ZxMQN8StEbpAuokKjDDP06SGl8y9p6I+4yTTSE2zQZ0s3esT/e+2uGECYQZ9M5zix0 KPr//6AP+CBt4zB639Y8FqZEiC4zVw+ayudrQoETObfBvHR7Hx1t70h+Oz2VU1bU4wsv /+rXr11GhV+EO1emKWlB2dInNOxkTQY/CsD+hIH1qJzf2IKhLGFGBYJmcsk+lMNGfLbc VegQ== X-Gm-Message-State: AFeK/H1pVmgpPmuUZOeNIfku3e4Ka78zmaX7yRefVo4P1owz7Y6MS2GnPeSSgdqex/YYysg4pCx2PrnFvpYgM9tt X-Received: by 10.31.185.73 with SMTP id j70mr983634vkf.102.1490959007595; Fri, 31 Mar 2017 04:16:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.76.193 with HTTP; Fri, 31 Mar 2017 04:16:47 -0700 (PDT) In-Reply-To: <71798e31-db27-eab5-7abe-942042bad905@t-online.de> References: <35E3AF49-8352-4171-B9BD-0E9807374BDE@fileaffairs.de> <0CF760B6-5FFD-4056-9406-30FC3D06DCF8@fileaffairs.de> <220ADC1A-D1E3-45E6-BF58-75C9A2D05C51@fileaffairs.de> <71798e31-db27-eab5-7abe-942042bad905@t-online.de> From: Wouter De Borger Date: Fri, 31 Mar 2017 13:16:47 +0200 Message-ID: Subject: Re: Make PDFBox fail on bad pdf To: users@pdfbox.apache.org Content-Type: multipart/alternative; boundary=001a11439f3c05dc29054c04f222 archived-at: Fri, 31 Mar 2017 11:16:53 -0000 --001a11439f3c05dc29054c04f222 Content-Type: text/plain; charset=UTF-8 thanks a lot, that looks like the clean solution! For type0 fonts, no textposition is created, but I can live with that. Thanks, Wouter On Thu, Mar 30, 2017 at 6:51 PM, Tilman Hausherr wrote: > The problem is that some files do this as an obfuscation technique. > > What might be detected is fonts that don't have unicode extraction. See in > LegacyPDFStreamEngine "if (unicode == null)". Make your own or extend it > and check for TextPosition objects with unicode null. (See > PrintTextLocations example from the source code download on how to get > TextPosition objects). > > Tilman > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org > For additional commands, e-mail: users-help@pdfbox.apache.org > > -- Wouter De Borger, PhD Co-founder Inmanta www.inmanta.com Email: wouter.deborger@inmanta.com --001a11439f3c05dc29054c04f222--