pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PageDrawer bug?
Date Mon, 29 Sep 2014 16:04:23 GMT
Hi,

The best is to upload the code and the PDFs to a public location.

PDF is not easy... coordinates that you see in the stream are always 
relative to the current transformation matrix.

Tilman

Am 29.09.2014 um 10:56 schrieb Frank van der Hulst:
> Hi all,
>
> I'm new to the list... I beg your indulgence if I'm out of line here, but
> here goes...
>
> I'm working on a PDF table extractor.  This is my second attempt at it, and
> this one is based on extending PageDrawer.
>
> In particular, I'm looking for table cells delineated by vertical &
> horizontal lines, and then grabbing whatever text is inside the rectangle.
>
> This works well for most PDFs I've tried (admittedly all from the same
> source), but there's a large subset that it doesn't work on. I've debugged
> my way through one, and it appears that when      processStream(page,
> page.findResources(), page.getContents().getStream()); calls fillPath() or
> strokepath() to draw the lines, they aren't drawn in the correct place.
> They seem to be offset some distance down the page.
>
> I've looked at a couple of my troublesome PDFs, and one thing they have in
> common is that they are v1.4, whereas the ones that work are v1.7.
>
> Sooo... Has anyone encountered this before? Is there a known bug with
> PageDrawer.processStream() or perhaps with the PdfStreamEngine and drawing
> of v1.4 PDFs?
>
> I'm happy to share my source code and example PDFs with anyone if it would
> help.
>
> Thanks
>
> Frank
>


Mime
View raw message