pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Loading documents with a large amount of annotations
Date Tue, 13 Nov 2018 07:28:16 GMT
Hi Nick,

PDFBox doesn't support parse on demand, so the only solution is to 
increase memory (-Xmx).

Yes you could hack into COSParser to ignore /Annots objects in a 
dictionary. I don't know what mayhem will happen.

Tilman

Am 13.11.2018 um 07:17 schrieb Nick Westerly:
> Hi -
>
> I am trying to load a document that has a lot of annotations (50k+) (i.e.
> comments, highlights, etc) However, just calling 'load' on the document is
> extremely slow, and uses a lot of memory (2G+).
>
> I actually don't need to use or access annotations at all (I'm using PDFBOX
> through a separate library that doesn't need them), but do need access to
> the PDDocument. Is there a way to load a document, but ignore all
> annotations when parsing? Similarly, ignoring all items such as fonts
> associated with those annotation objects.
>
> I was browsing through PDFParser#initialiParse and COSParser, but a little
> out of my depth.
> Even something as simple as ignoring objects if they are of some 'type' i
> could check.
>
> Any suggestions, even partial, would be helpful.
>
> Thanks.
>
> Nick
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message