pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance
Date Sat, 03 Feb 2018 22:03:50 GMT
Am 02.02.2018 um 09:04 schrieb Serban Alexe:
> Thanks for the hints, I'll look into both of them.
>
> I'm aware that it's not possible to obtain something that looks like the
> original PDF, I'm rather aiming for something as close as possible, at
> least from the content perspective.
>
> *As an alternative* I could settle for a solution that extracts each page
> from the pdf as an individual image. What options would I have in this case
> ?

See here:
https://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images

Tilman


>
> Thanks.
>
>
>
> On 2018/02/01 16:14:00, Serban Alexe <s...@gmail.com> wrote:
>> Hello everybody,>
>>
>> I need to write a Java class that converts a *.pdf* document to the html>
>> format, preferably keeping the original formatting to the best extent>
>> possible.>
>> Also, I need to be able to extract the images (and preferably encode
> them>
>> as base64 in the html file).>
>>
>> *Can you please provide me some useful starting points and/or examples ?
> *>
>> Through google search, I was able to find some limited functionality>
>> examples. None of these deal with images, and also my guess is that they>
>> refer to some older version of the PDFBox suite...>
>>
>> Thank you,>
>>
>> Serban>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message