pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: OutOfMemoryError in PDExtendedGraphicsState#getLineDashPattern
Date Tue, 20 Mar 2018 21:42:15 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Andreas,

On 3/20/18 5:35 PM, Andreas Hubold wrote:
> I'm getting an OutOfMemoryError from PDFBox when parsing a certain
> PDF using the Apache Tika App v 1.17 - which uses PDFBox 2.0.8
> internally. This is reproducible even with 8GB heap.
> 
> The OutOfMemoryError happens in 
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState#getLi
neDashPattern,
>
> 
which contains this piece of suspicious code:
> 
> COSArray dp = (COSArray) dict.getDictionaryObject( COSName.D ); if(
> dp != null ) { COSArray array = new COSArray(); dp.addAll(dp);
> 
> The last line seems to wrong?

That certainly looks wrong to me.

> It appends all elements from 'dp' to 'dp' again, effectively 
> duplicating the elements in the list. Maybe it should be 
> 'array.addAll(dp)' or something like that?
> 
> Can you confirm this being a bug? Should I open a JIRA ticket for
> this problem?
> 
> Do you know a workaround to avoid the crash, e.g. an option to skip
> some parts of the file for text extraction?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqxgDYdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFj/Ew/7BqHbZpfLea7necmh
zY6oOLIgLRwoarm61rWt8Kz6+Z+SGgU/8x5exQvJoZh8UhBG/sJ3OBIpdx5utMVM
/XsvEj8k0CEMPLnvhq5D+akszJbfB3GWZgwZVdhUq6tMbWKPrXVqlJ4/boLBlWYY
gOdkIkkULFuJtdk8rQ8GctbBmMnraSCyEvShLuuVOOi/m0MOMJnHIO6Ul6odWxWr
gDLVsT4UXVb6G2fDDeTx9LkadOalAFDAbSNlH+MwI/uoA3L9o9Vs7Hz8LE5pt4ds
ATBMS44hm+mk46t41VCD+dWP5adsJyZdzcZW+td0TUVGskeTHGfQ1uqDbBlFWyyA
n06sqi5xFnJvO/nCAl8lX0P8xPhJG1xi1/oF4vHAr3LzwxELE5U5oV+l2Qk06Sdc
RUNMuEyruiDlxj0Xm4xOnyy0X08RWjIp0XPyYW7DpGNIFxd+Wq/RC2ybUtSi2Ek7
2b5bd4rvk1jXdkEoBol/UB2rhNYDQUyqNPwU1ManA1coaHhqPRpDo8j4J0+ika9p
+qsdsgRqOu5oIzBHE8uLnW+ViuAuuFDNGySWgbxdelrARXGj/1MgTaFqQUKjNwHg
qFdZ9P29Kwv+oqQvJdkPpre9YoP2EJI49gV5EBakerM5/6BY+4wV03pNhtwoSL0r
tr/qb0cGpzAr+2kKZsohQYDjEa0=
=OFd7
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message