pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance
Date Sat, 03 Feb 2018 22:03:50 GMT
Am 02.02.2018 um 09:04 schrieb Serban Alexe:
> Thanks for the hints, I'll look into both of them.
> I'm aware that it's not possible to obtain something that looks like the
> original PDF, I'm rather aiming for something as close as possible, at
> least from the content perspective.
> *As an alternative* I could settle for a solution that extracts each page
> from the pdf as an individual image. What options would I have in this case
> ?

See here:


> Thanks.
> On 2018/02/01 16:14:00, Serban Alexe <s...@gmail.com> wrote:
>> Hello everybody,>
>> I need to write a Java class that converts a *.pdf* document to the html>
>> format, preferably keeping the original formatting to the best extent>
>> possible.>
>> Also, I need to be able to extract the images (and preferably encode
> them>
>> as base64 in the html file).>
>> *Can you please provide me some useful starting points and/or examples ?
> *>
>> Through google search, I was able to find some limited functionality>
>> examples. None of these deal with images, and also my guess is that they>
>> refer to some older version of the PDFBox suite...>
>> Thank you,>
>> Serban>

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message