pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Re: Relation between COS and PD model
Date Wed, 03 Mar 2010 14:11:19 GMT

2010/3/3 Andreas Lehmkühler <andreas@lehmi.de>:
> Von: Johannes Koch<johannes.koch@fit.fraunhofer.de>
>> How will caching PD objects synchronize their cached PD objects with
>> underlying COS data changed by other PD objects?
> I don't remember a concrete example, but I'm sure that there are a few. But I think the
> solution is obvious. You just have to reinitialize your cached value when calling the
> corresponding setter.

See PDFont.get/setEncoding for a good example of this.

The problem that I believe Johannes is referring to is that there's
currently no way for the PD object to know when the underlying COS
object (typically a dictionary) is changed, which makes all the
current caching solutions a bit brittle. This is also why I was
opposed to the earlier idea of extending the current COSObjectable
mechanism and would in fact prefer to avoid it as much as possible.

PS. I've been trying (see PDFBOX-626) to reduce the memory impact of
the full COS object hierarchy that we keep in memory for all PDF
documents, but it looks like there are no more big improvements to be
made without some radical design changes. One thing I've been
considering is making the PD model the canonical data layer and using
COS objects only during parsing and serialization. This should give us
dramatic memory improvements for text extraction and rendering use
cases, but may be troublesome for all use cases where existing PDF
documents are being modified. Perhaps we should consider creating an
optimized "read only" version of PDFBox in addition to the fully
featured version we now have.


Jukka Zitting

View raw message