pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank van der Hulst <drifter.fr...@gmail.com>
Subject PageDrawer bug?
Date Mon, 29 Sep 2014 08:56:29 GMT
Hi all,

I'm new to the list... I beg your indulgence if I'm out of line here, but
here goes...

I'm working on a PDF table extractor.  This is my second attempt at it, and
this one is based on extending PageDrawer.

In particular, I'm looking for table cells delineated by vertical &
horizontal lines, and then grabbing whatever text is inside the rectangle.

This works well for most PDFs I've tried (admittedly all from the same
source), but there's a large subset that it doesn't work on. I've debugged
my way through one, and it appears that when      processStream(page,
page.findResources(), page.getContents().getStream()); calls fillPath() or
strokepath() to draw the lines, they aren't drawn in the correct place.
They seem to be offset some distance down the page.

I've looked at a couple of my troublesome PDFs, and one thing they have in
common is that they are v1.4, whereas the ones that work are v1.7.

Sooo... Has anyone encountered this before? Is there a known bug with
PageDrawer.processStream() or perhaps with the PdfStreamEngine and drawing
of v1.4 PDFs?

I'm happy to share my source code and example PDFs with anyone if it would
help.

Thanks

Frank

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message