pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Scratch files - too many files open
Date Wed, 03 Jun 2015 15:45:14 GMT

Am 03.06.2015 um 13:20 schrieb Jesse Long:
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
>> Hi,
>>> Jesse Long <jesse.long.za@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>>> Hi,
>>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>>> Hi All,
>>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>>> COSStream
>>>>> uses one or two scratch files.
>>>>> I recently ran into the problem on Linux where the max number of open
>>>>> files
>>>>> allowed to the JVM by the OS was reached because of this.
>>>>> Is there a plan around this?
>>>>> Is it maybe that my use case is not expected?
>>>> I'm aware of that. The refactoring is still in progress. I expect to
>>>> reduce the number of open files.
>>>>> My use case is:
>>>>> Open PDDocument 1
>>>>> Open PDDocument 2
>>>>> for a few hundred times
>>>>>           import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>>> some stuff
>>>>> ontop.
>>>>> save PDDocument 2.
>>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>>> a scratch
>>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>>> streams in
>>>>> the same file. Would you be interested in adding this to PDFBox?
>>>> To use one file only led to problems when creating pdfs from scratch.
>>>> It is possible to write to 2 COSStreams at the same time which
>>>> corrupts pdf.
>>> Hi Andreas,
>>> Do you mean at the same time, as in multiple threads, or single thread
>>> writing a bit to this stream and then a bit to another stream back and
>>> forth?
>> It's about the second case. You can't add fonts and/or images to a page while
>> adding content to a contentstream the same time. You have to add those before
>> opening a stream or you have to close the stream before
>>> For the single thread use case, I have solved this in my patch.
>>> Actually, even multiple thread should be easy to support with
>>> synchronization. I'll work on some docs and submit and you can see if
>>> you like it.
>> At least it sounds interesting and I'm happy to look at it.
> Please see patch attached.
Looks promising, I'll have a deeper look later.

> Thanks,
> Jesse


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message