pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe Vandeplas <christo...@vandeplas.com>
Subject OutOfMemory Exception because of huge colors
Date Mon, 26 Mar 2012 05:42:02 GMT
Hello List,


I'm working on a PDF scanning tool and with a specific (malicious) PDF
I always get OutOfMemory Errors.

The backtrace is:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.apache.pdfbox.filter.FlateFilter.decodePredictor(FlateFilter.java:218)
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:170)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
	at ScanPdf.checkCOSBaseObject(ScanPdf.java:199)
        ...

When looking in the PDFBox code FlateFilter.java:218 is
byte[] lastline = new byte[rowlength];

In that contact rowlength = 1073741838   => seems rather big, no?
Looking back in the code it seems that it's colors who is so big.
Colors seems to be extracted from the dict in FlateFilter.java:96:
colors = dict.getInt(COSName.COLORS);

The (malicious) PDF has indeed the definition :    /Colors 1073741838

So my question is now:
Is this something I need to catch in my own code, or should PDFBox be
patched to catch such issues? (like the catched OutOfMemoryError in
FlateFilter:124)


Thanks for your expertise
Christophe

Mime
View raw message