pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arthur Wang <arthurwang2...@hotmail.com>
Subject can pdfbox clean the scanned pdf
Date Thu, 07 Jun 2018 17:49:21 GMT

Hi, all,

I tried to convert a scanned image file (see attached: original_image.png) into a pdf(see
attached: converted_pdf) file by using the example ImageToPdf code. it actually works very
well after some adjustment, however, the converted pdf still keep some grey, or dark color
marks, is there any way to clean it? I saw some commercial software which can scan a homedepot
receipt into a very clean pdf, not sure if PDFBox can do the same thing? maybe have to get
some OCR package to further process it?

I also copied the code i used below. The PDFBox version is: pdfbox.2.0.9

thanks for any comment,



try (PDDocument doc = new PDDocument())
            PDPage page = new PDPage();

            PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);

            // draw the image at full size at (x=20, y=20)
            try (PDPageContentStream contents = new PDPageContentStream(doc, page))

                 contents.drawImage(pdImage, -20, -80, pdImage.getWidth() / 2, pdImage.getHeight()
/ 2);


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message