From users-return-10863-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Tue Mar 20 22:42:31 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 90B0B18064A for ; Tue, 20 Mar 2018 22:42:30 +0100 (CET) Received: (qmail 61497 invoked by uid 500); 20 Mar 2018 21:42:24 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 61478 invoked by uid 99); 20 Mar 2018 21:42:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2018 21:42:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9E4EBC00D6 for ; Tue, 20 Mar 2018 21:42:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.999 X-Spam-Level: X-Spam-Status: No, score=0.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, SPF_HELO_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id FjeM5DTCqx9t for ; Tue, 20 Mar 2018 21:42:21 +0000 (UTC) Received: from mailbox.servedge.com (li1281-212.members.linode.com [45.79.182.212]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5218C5F1EC for ; Tue, 20 Mar 2018 21:42:21 +0000 (UTC) Received: (qmail 20049 invoked by uid 513); 20 Mar 2018 16:42:15 -0500 Received: from pool-173-66-120-163.washdc.fios.verizon.net (HELO Christophers-iMac.local) (chris@christopherschultz.net@173.66.120.163) by mailbox.servedge.com with ECDHE-RSA-AES128-GCM-SHA256 encrypted SMTP; 20 Mar 2018 16:42:15 -0500 Subject: Re: OutOfMemoryError in PDExtendedGraphicsState#getLineDashPattern To: users@pdfbox.apache.org References: From: Christopher Schultz Message-ID: Date: Tue, 20 Mar 2018 17:42:15 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Andreas, On 3/20/18 5:35 PM, Andreas Hubold wrote: > I'm getting an OutOfMemoryError from PDFBox when parsing a certain > PDF using the Apache Tika App v 1.17 - which uses PDFBox 2.0.8 > internally. This is reproducible even with 8GB heap. > > The OutOfMemoryError happens in > org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState#getLi neDashPattern, > > which contains this piece of suspicious code: > > COSArray dp = (COSArray) dict.getDictionaryObject( COSName.D ); if( > dp != null ) { COSArray array = new COSArray(); dp.addAll(dp); > > The last line seems to wrong? That certainly looks wrong to me. > It appends all elements from 'dp' to 'dp' again, effectively > duplicating the elements in the list. Maybe it should be > 'array.addAll(dp)' or something like that? > > Can you confirm this being a bug? Should I open a JIRA ticket for > this problem? > > Do you know a workaround to avoid the crash, e.g. an option to skip > some parts of the file for text extraction? - -chris -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqxgDYdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFj/Ew/7BqHbZpfLea7necmh zY6oOLIgLRwoarm61rWt8Kz6+Z+SGgU/8x5exQvJoZh8UhBG/sJ3OBIpdx5utMVM /XsvEj8k0CEMPLnvhq5D+akszJbfB3GWZgwZVdhUq6tMbWKPrXVqlJ4/boLBlWYY gOdkIkkULFuJtdk8rQ8GctbBmMnraSCyEvShLuuVOOi/m0MOMJnHIO6Ul6odWxWr gDLVsT4UXVb6G2fDDeTx9LkadOalAFDAbSNlH+MwI/uoA3L9o9Vs7Hz8LE5pt4ds ATBMS44hm+mk46t41VCD+dWP5adsJyZdzcZW+td0TUVGskeTHGfQ1uqDbBlFWyyA n06sqi5xFnJvO/nCAl8lX0P8xPhJG1xi1/oF4vHAr3LzwxELE5U5oV+l2Qk06Sdc RUNMuEyruiDlxj0Xm4xOnyy0X08RWjIp0XPyYW7DpGNIFxd+Wq/RC2ybUtSi2Ek7 2b5bd4rvk1jXdkEoBol/UB2rhNYDQUyqNPwU1ManA1coaHhqPRpDo8j4J0+ika9p +qsdsgRqOu5oIzBHE8uLnW+ViuAuuFDNGySWgbxdelrARXGj/1MgTaFqQUKjNwHg qFdZ9P29Kwv+oqQvJdkPpre9YoP2EJI49gV5EBakerM5/6BY+4wV03pNhtwoSL0r tr/qb0cGpzAr+2kKZsohQYDjEa0= =OFd7 -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org