pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edson Alves Pereira <lottal...@gmail.com>
Subject Re: Extracting text between two bookmarks using Apache PdfBox
Date Tue, 06 Mar 2012 18:13:32 GMT
Is possible that your whole page is inside a bookmark, check how is your
pdf structure.

On Tue, Mar 6, 2012 at 4:26 AM, Shriram <shriram.goal@yahoo.com> wrote:

> I am using Apache PDFBox to read a PDF document which has a hierarchy,
> which is defined by the bookmarks. The hierarchy is in a tree form with
> contents only at the leaf level. When I try to extract the text between two
> leaf level bookmarks(using Stripper.setStartBookmark(),
> Stripper.setEndBookmark() and Stripper.writeText()), I get the text in the
> whole page instead. In short, my problem is similar to that mentioned in
> http://www.java-forums.org/advanced-java/51032-pdox-1-6-0-extract-text-between-2-bookmarks-same-page-sos.html
> Is there a way to extract the contents between two bookmarks? If so, what
> should be the change in my code?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message