pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PageDrawer bug?
Date Tue, 30 Sep 2014 06:00:28 GMT
Hi,

The best is to download source code from the source and not from some 
secondary websites.

https://pdfbox.apache.org/download.cgi#recent

Still can't tell why it doesn't work for you because you didn't post 
your code :-(

Tilman



Am 30.09.2014 um 05:56 schrieb Frank van der Hulst:
> Thanks for the replies... I'm working with 1.8.7, but the same applied to
> 1.8.6 and I think 1.8.5.
>
> convertToImage() works properly, which was a bit surprising when I looked
> into it and found that it created a PageDrawer object. So I tried copying
> the source code for convertToImage into my code. That worked fine too.
>
> Then I tried copying the source from
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/pdfbox/1.8.6/org/apache/pdfbox/pdfviewer/PageDrawer.java?av=f
> (couldn't find 1.8.7) into my own PageDrawer class. That *doesn't* work
> properly...  lines aren't drawn at all (probably off the page?). I don't
> understand this at all... surely identical code will do the same thing???
> Or is something else in the pdfbox library directly accessing
> org.apache.pdfbox.pdfviewer.PageDrawer via one of its public methods?
>
> This may be the case because when I changed my PageDrawer to extend
> org.apache.pdfbox.pdfviewer.PageDrawer instead of PdfStreamEngine, it
> worked perfectly. Which is all the more confusing because my original class
> extended PageDrawer and didn't work.
>
> Frank
>
>
> On Tue, Sep 30, 2014 at 5:04 AM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Hi,
>>
>> The best is to upload the code and the PDFs to a public location.
>>
>> PDF is not easy... coordinates that you see in the stream are always
>> relative to the current transformation matrix.
>>
>> Tilman
>>
>> Am 29.09.2014 um 10:56 schrieb Frank van der Hulst:
>>
>>   Hi all,
>>> I'm new to the list... I beg your indulgence if I'm out of line here, but
>>> here goes...
>>>
>>> I'm working on a PDF table extractor.  This is my second attempt at it,
>>> and
>>> this one is based on extending PageDrawer.
>>>
>>> In particular, I'm looking for table cells delineated by vertical &
>>> horizontal lines, and then grabbing whatever text is inside the rectangle.
>>>
>>> This works well for most PDFs I've tried (admittedly all from the same
>>> source), but there's a large subset that it doesn't work on. I've debugged
>>> my way through one, and it appears that when      processStream(page,
>>> page.findResources(), page.getContents().getStream()); calls fillPath()
>>> or
>>> strokepath() to draw the lines, they aren't drawn in the correct place.
>>> They seem to be offset some distance down the page.
>>>
>>> I've looked at a couple of my troublesome PDFs, and one thing they have in
>>> common is that they are v1.4, whereas the ones that work are v1.7.
>>>
>>> Sooo... Has anyone encountered this before? Is there a known bug with
>>> PageDrawer.processStream() or perhaps with the PdfStreamEngine and drawing
>>> of v1.4 PDFs?
>>>
>>> I'm happy to share my source code and example PDFs with anyone if it would
>>> help.
>>>
>>> Thanks
>>>
>>> Frank
>>>
>>>


Mime
View raw message