pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir H. Jadidinejad" <amir.jad...@yahoo.com.INVALID>
Subject How to find the position of a specific paragraph in the input PDF?
Date Mon, 04 Aug 2014 01:53:12 GMT

I'm going to extract the content of a PDF file using PDFBox library. The content should be
processed paragraph-by-paragraph and for each paragraph, I need its position for follow-up
processing. Using the following code, I can extract the whole content of an input PDF:

PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
String txt = stripper.getText(doc);

I have two problems:

    1. I don't know how to extract the content paragraph by paragraph.
    2. I don't know how to store the position of a paragraph for follow-up processing (for
example highlighting and etc.)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message