pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: PageDrawer bug?
Date Mon, 29 Sep 2014 09:18:19 GMT
which version of PDFBox are you working with?

If you convert the troublesome PDFs to an image file using PDFToImage are the lines also at
the wrong position?

BR

Maruan Sahyoun

Am 29.09.2014 um 10:56 schrieb Frank van der Hulst <drifter.frank@gmail.com>:

> Hi all,
> 
> I'm new to the list... I beg your indulgence if I'm out of line here, but
> here goes...
> 
> I'm working on a PDF table extractor.  This is my second attempt at it, and
> this one is based on extending PageDrawer.
> 
> In particular, I'm looking for table cells delineated by vertical &
> horizontal lines, and then grabbing whatever text is inside the rectangle.
> 
> This works well for most PDFs I've tried (admittedly all from the same
> source), but there's a large subset that it doesn't work on. I've debugged
> my way through one, and it appears that when      processStream(page,
> page.findResources(), page.getContents().getStream()); calls fillPath() or
> strokepath() to draw the lines, they aren't drawn in the correct place.
> They seem to be offset some distance down the page.
> 
> I've looked at a couple of my troublesome PDFs, and one thing they have in
> common is that they are v1.4, whereas the ones that work are v1.7.
> 
> Sooo... Has anyone encountered this before? Is there a known bug with
> PageDrawer.processStream() or perhaps with the PdfStreamEngine and drawing
> of v1.4 PDFs?
> 
> I'm happy to share my source code and example PDFs with anyone if it would
> help.
> 
> Thanks
> 
> Frank


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message