pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Александр Свиридов <ooo_satu...@mail.ru>
Subject PdfBox - extra symbols when converting pdf page to image
Date Sun, 28 Jun 2015 08:49:39 GMT

I use apache pfdbox 1.8.9. I have one page pdf file which contains text and I want to convert
this page to image. This pdf file I did via Libre Office. I use the following code:
PDDocument document =PDDocument.loadNonSeq(newFile(filename),null);
List<PDPage> pdPages = document.getDocumentCatalog().getAllPages();
int page =0;for(PDPage pdPage : pdPages){
BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB,300);
ImageIOUtil.writeImage(bim,"png","/home/file"+"-"+ page,300);
The code works, I get png image. The problem is that there are a lot of strange extra symbols
which make it very difficult to read the text. How to fix it? 
The image is here  http://i.stack.imgur.com/OUyLO.png

Александр Свиридов
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message