pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hartmann Toël <Toel.Hartm...@elanders.com>
Subject Re: Extract Text from page object?
Date Thu, 11 May 2017 11:48:32 GMT
(a) yes
(b) yes

very basic example code:
	    StringWriter out = new StringWriter();
            PDDocument doc = PDDocument.load(file);
            nbPages = doc.getNumberOfPages();
            PDFTextStripper stripper = new PDFTextStripper();
            stripper.writeText(doc, out);
            txt = out.toString().trim();

Please check the sample code included in pdfbox for better examples

Best regards
Toël Hartmann

On 11 maj 2017, at 12:47, David Patterson <patterd20850@gmail.com> wrote:

> Is is possible to
> (a) iterate over the PDF by page [I believe the answer is "Yes"]
> (b) extract the text from a page [Don't know]
> This would allow some nice capabilities, but with an added complexity of
> words that split between pages.
> Thanks for the info.
> Dave Patterson

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message