pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shriram <shriram.g...@yahoo.com>
Subject Extracting text between two bookmarks using Apache PdfBox
Date Tue, 06 Mar 2012 07:34:26 GMT
I am using Apache PDFBox to read a PDF document which has a hierarchy, which is defined by
the bookmarks. The hierarchy is in a tree form with contents only at the leaf level. When
I try to extract the text between two leaf level bookmarks(using Stripper.setStartBookmark(),
Stripper.setEndBookmark() and Stripper.writeText()), I get the text in the whole page instead.
In short, my problem is similar to that mentioned in http://www.java-forums.org/advanced-java/51032-pdox-1-6-0-extract-text-between-2-bookmarks-same-page-sos.html

Is there a way to extract the contents between two bookmarks? If so, what should be the change
in my code?
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message