pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance
Date Thu, 01 Feb 2018 17:18:09 GMT

Please have a look at the PDFText2HTML class in the source code 
download. There is also an ExtractImages and a PrintImageLocations 
class, but each of them is alone... you'll never get something like a 
PDF because PDF and HTML are really two different things.


Am 01.02.2018 um 17:14 schrieb Serban Alexe:
> Hello everybody,
> I need to write a Java class that converts a *.pdf* document to the html
> format, preferably keeping the original formatting to the best extent
> possible.
> Also, I need to be able to extract the images (and preferably encode them
> as base64 in the html file).
> *Can you please provide me some useful starting points and/or examples ? *
> Through google search, I was able to find some limited functionality
> examples. None of these deal with images, and also my guess is that they
> refer to some older version of the PDFBox suite...
> Thank you,
> Serban

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message