pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Thomas Panicker <ro...@qburst.com>
Subject Re: Issue with PDF - Image conversion
Date Tue, 18 Jun 2013 05:38:46 GMT
Thanks a lot Gilad and Andreas,
I was out of town last week and hence could not reply.

I have attached the sample PDF and the image generated (only for the first
page)

If you notice the original pdf and the converted image,  the words "The
pressures" and "The solution" is not coming correctly in the converted
image. The rest of the image looks fine.

I have also attached a very very crude java code that does a standalone
task of converting this pdf into image.

Can you please let me know what could be possibly causing the image issue?

Thanks,
Robin





On Tue, Jun 11, 2013 at 5:37 PM, Andreas Lehmkuehler <andreas@lehmi.de>wrote:

> Hi,
>
> Am 10.06.2013 11:15, schrieb Robin Thomas Panicker:
>
>  Thanks a lot Gilad, for responding. I was not sure on what more
>> information
>> to provide. Now that you have asked me the specific details, let me
>> provide
>> you with more information.
>>
>> I am using the below code to do the conversion of PDF - image. (Trying to
>> save the first page of the pdf as an image file)
>>
>>   String pdfFile ="d:/hs/4.pdf";
>>   document = PDDocument.load( pdfFile );
>>
>>              List pages = document.getDocumentCatalog().**getAllPages();
>>              PDPage page = ( PDPage ) pages.get( 0 );
>>              int width = ( int ) page.getArtBox().getWidth();
>>              int height = ( int ) page.getArtBox().getHeight();
>>              BufferedImage image = page.convertToImage( imageType,
>> resolution );
>>
>>
>> On a machine (prod server) where the conversion DOES NOT work, I have
>> Ubuntu 12.4, open office 3.0
>> while on a machine (development machine) where the conversion works, I
>> have
>> Ubuntu 10.10 and open office 3.0
>>
>> On both the machines I am using the same code and version of PDFBox on
>> both
>> is 1.8.1
>>
>> The issue that I face is that the image conversion simply doesnt work
>> correctly ( I can see parts of image / text garbled, or missing) There is
>> no error or warning on the log outputs.
>>
>> Please let me know if I can provide you with any more information in
>> understanding the problem
>>
> Without a sample pdf this is just a guess:
>
> The fact that you are using open office 3.0 leads to the assumption that
> the pdf
> in question contains fonts as embedded subsets. Those are not fully
> supported
> by PDFBox. There are different issues with those kind of fonts.
> As you are using different platforms (Ubuntu 10.10 vs 12.04) you are most
> likely
> using different versions of the JDK (1.6 vs 1.7). There are some 1.7
> specific
> issues with embedded font subsets.
>
>
>  Thanks,
>> Robin
>>
>>
>>
>> On Mon, Jun 10, 2013 at 2:25 PM, Gilad Denneboom
>> <gilad.denneboom@gmail.com>**wrote:
>>
>>  A lof of information missing, there... How are you converting the PDF
>>> files, exactly? What type of problems do you encounter? Which version of
>>> PDFBox do you use? And what does it have to do with your Office suite
>>>
>>> Without more information it's impossible to help you with your problem.
>>>
>>>
>>> On Mon, Jun 10, 2013 at 8:22 AM, Robin Thomas Panicker <robin@qburst.com
>>>
>>>> wrote:
>>>>
>>>
>>>  Hi,
>>>>           I am using PDFBox to convert PDF documents into images.
>>>> However
>>>>
>>> in
>>>
>>>> some machines I am facing an issue. The conversion does not happen
>>>>
>>> correct.
>>>
>>>> I can see missing text / images etc.
>>>>
>>>> Please note that this happens only in a few machines. I use Ubuntu and
>>>> OpenOffice. I have tried with a variety of combinations for difference
>>>> version of Ubuntu and Openoffice (and even LibreOffice)
>>>>
>>>> However I am unable to find out why it does not work on some machines.
>>>>
>>>> Can anyone please help?
>>>>
>>>> Thanks,
>>>> Robin
>>>>
>>>
> BR
> Andreas Lehmkühler
>
>


-- 

Robin Panicker,
Q*Burst*
www.qburst.com
Skype: Robin.at.qburst

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message