pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serban Alexe <serban.al...@gmail.com>
Subject Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance
Date Fri, 02 Feb 2018 08:04:39 GMT
Thanks for the hints, I'll look into both of them.

I'm aware that it's not possible to obtain something that looks like the
original PDF, I'm rather aiming for something as close as possible, at
least from the content perspective.

*As an alternative* I could settle for a solution that extracts each page
from the pdf as an individual image. What options would I have in this case


On 2018/02/01 16:14:00, Serban Alexe <s...@gmail.com> wrote:
> Hello everybody,>
> I need to write a Java class that converts a *.pdf* document to the html>
> format, preferably keeping the original formatting to the best extent>
> possible.>
> Also, I need to be able to extract the images (and preferably encode
> as base64 in the html file).>
> *Can you please provide me some useful starting points and/or examples ?
> Through google search, I was able to find some limited functionality>
> examples. None of these deal with images, and also my guess is that they>
> refer to some older version of the PDFBox suite...>
> Thank you,>
> Serban>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message