pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank van der Hulst <drifter.fr...@gmail.com>
Subject Re: PageDrawer bug?
Date Tue, 30 Sep 2014 03:56:39 GMT
Thanks for the replies... I'm working with 1.8.7, but the same applied to
1.8.6 and I think 1.8.5.

convertToImage() works properly, which was a bit surprising when I looked
into it and found that it created a PageDrawer object. So I tried copying
the source code for convertToImage into my code. That worked fine too.

Then I tried copying the source from
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/pdfbox/1.8.6/org/apache/pdfbox/pdfviewer/PageDrawer.java?av=f
(couldn't find 1.8.7) into my own PageDrawer class. That *doesn't* work
properly...  lines aren't drawn at all (probably off the page?). I don't
understand this at all... surely identical code will do the same thing???
Or is something else in the pdfbox library directly accessing
org.apache.pdfbox.pdfviewer.PageDrawer via one of its public methods?

This may be the case because when I changed my PageDrawer to extend
org.apache.pdfbox.pdfviewer.PageDrawer instead of PdfStreamEngine, it
worked perfectly. Which is all the more confusing because my original class
extended PageDrawer and didn't work.

Frank


On Tue, Sep 30, 2014 at 5:04 AM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Hi,
>
> The best is to upload the code and the PDFs to a public location.
>
> PDF is not easy... coordinates that you see in the stream are always
> relative to the current transformation matrix.
>
> Tilman
>
> Am 29.09.2014 um 10:56 schrieb Frank van der Hulst:
>
>  Hi all,
>>
>> I'm new to the list... I beg your indulgence if I'm out of line here, but
>> here goes...
>>
>> I'm working on a PDF table extractor.  This is my second attempt at it,
>> and
>> this one is based on extending PageDrawer.
>>
>> In particular, I'm looking for table cells delineated by vertical &
>> horizontal lines, and then grabbing whatever text is inside the rectangle.
>>
>> This works well for most PDFs I've tried (admittedly all from the same
>> source), but there's a large subset that it doesn't work on. I've debugged
>> my way through one, and it appears that when      processStream(page,
>> page.findResources(), page.getContents().getStream()); calls fillPath()
>> or
>> strokepath() to draw the lines, they aren't drawn in the correct place.
>> They seem to be offset some distance down the page.
>>
>> I've looked at a couple of my troublesome PDFs, and one thing they have in
>> common is that they are v1.4, whereas the ones that work are v1.7.
>>
>> Sooo... Has anyone encountered this before? Is there a known bug with
>> PageDrawer.processStream() or perhaps with the PdfStreamEngine and drawing
>> of v1.4 PDFs?
>>
>> I'm happy to share my source code and example PDFs with anyone if it would
>> help.
>>
>> Thanks
>>
>> Frank
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message