pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fill Freeman <fill.free...@gmail.com>
Subject Usage PDMarkedContentExtractor
Date Fri, 05 Sep 2014 13:14:27 GMT
Hello.
I'm newbie in PDFBox, and I have a question.
As I understand, there can be a kind of html-markup in PDF file. Of course
not all PDF files use it, but anyway. Is it possible to use
PDMarkedContentExtractor class to extract some marked content?

For example: I have a PDF file with a table. I use the iTextRups utility to
browse a structure of a PDF file. I see that there is a Table node with TR
and TD child nodes. As I understand, elements of the structure use MCID
markers. And PDMarkedContentExtractor should use it to "extract" text
marked with specified MCID. Am I right? If it is true, could somebody show
me some simple example of it's usage, because I have no Idea how it should
work.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message