pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Issue regarding the slow image rendering for certain PDF file using PDFToImage
Date Wed, 17 Jan 2018 18:40:21 GMT
Hi,

Your issue has been fixed. Try with a snapshot or wait for 2.0.9.
https://issues.apache.org/jira/browse/PDFBOX-4060

Tilman


Am 09.01.2018 um 19:59 schrieb Tilman Hausherr:
> Am 09.01.2018 um 09:05 schrieb Tilman Hausherr:
>> Am 09.01.2018 um 07:10 schrieb Soon Keong Tan:
>>> My team is having some problems with the image rendering speed of 
>>> certain
>>> PDF file. For most of the pdf files we are handling, it only took 
>>> seconds
>>> to create an image of the file but for certain pdf, it took more than 6
>>> minutes.
>>>
>>> We have tried the following version of pdfbox-app-x.x.x.jar, and it 
>>> seems
>>> that 1.8.x is more efficient at rendering the image.
>>>   (1)1.8.13  - 1.5 mins
>>>   (2)2.0.5 - 6.18 mins
>>>   (3)2.0.8 -  6.35 mins
>>> However, due to the problem that we had with some files where some 
>>> Japanese
>>> characters were not correctly rendered using 1.8.13, we had to use 
>>> 2.0.5 as
>>> the production version.
>>>
>>> I tried inserting some debug code in the PDFToImage class (ver2.0.5) to
>>> determine the bottle-necked process, and it seems
>>> "renderer.renderImageWithDPI" was taking up most of the time.
>>>
>>> ==========================
>>> Java version: 1.7.0_72
>>> PDFBox version: 2.0.5
>>> Command line: java -jar ./pdfbox-app-2.0.5.jar PDFToImage -time 
>>> -startPage
>>> 1 -endPage 1 ./sample_slow.pdf
>>> File: https://goo.gl/WEMM2X
>>> ==========================
>>> The full version of the PDF is quite large, so the linked file above 
>>> is the
>>> cropped version (the page which we are having problem rendering). The
>>> cropped version is created using PDFSplit command line.
>>>
>>> This is my first time using the mailing list, should I just create a 
>>> JIRA
>>> ticket requesting help instead of addressing the mailing list regarding
>>> this problem?
>>
>>
>> It's fine to post to the mailing list first.
>>
>> I had a quick look on your file... it has 1999 probably identical 
>> separation colorspaces that are just a black or white value. These 
>> map to a CMYK colorspace.
>>
>> I'll look more later this week.
>>
>> Did you set / try the two settings mentioned here?
>> https://pdfbox.apache.org/2.0/getting-started.html
>
>
> I ran the profiler and the cause of the slowness is different... there 
> is a large jpeg file (5349 x 3806) that uses a DeviceN colorspace 
> which in turn is based on CMYK. There's some slowness due to the type0 
> convert function from DeviceN to CMYK, but this is less than 10%. Most 
> time is from converting the CMYK to RGB, one pixel at a time. (Because 
> of the DeviceN colorspace we can't use bulk conversion, which may or 
> may not be faster)
>
> I've opened a JIRA issue 
> (https://issues.apache.org/jira/browse/PDFBOX-4060) but don't expect 
> this to be fixed soon. CMYK and ICC colorspaces are our weak spot :-(
>
> Tilman
>
>
>
>>
>> Tilman
>>
>>
>>>
>>> Any help is deeply appreciated. Thank you in anticipation.
>>>
>>> Regards,
>>> Soon Keong Tan
>>> ----------------------
>>> tansk.proj@gmail.com
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message