pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Help identifying hair-lines in PDFs using PDFBox and tabula
Date Tue, 23 May 2017 09:54:10 GMT
> Gilad Denneboom <gilad.denneboom@gmail.com> hat am 22. Mai 2017 um 22:07 geschrieben:
> Hi all,
> So I'm trying to identify hair-lines in my PDFs. I came across tabula,
> which seems to be able to do it, but I can't get it to quite work with my
> files in the way I need it to, so any help is greatly appreciated!
> Here's what I've been doing so far: I used the Ruling object from tabula to
> extract both the horizontal and vertical rules from a stripped version of
> the PDF page (ie, after removing all the text in it).
> I'm getting results but now I want to relate them back to the original PDF
> page, and that's proving difficult. If I add a text field using the
> coordinates of the Ruling objects they are way off then where I would
> expect them to be. I think it has to do with the DPI setting used to
> convert the PDF page to an image, which is necessary for the rulings
> extraction.
> So my question is: How can I take these Ruling objects and convert them
> back to the original coordinates of the PDF?
> I would also like to be able to only identify lines of a certain width and
> height, but if I get the rectangles to work correctly I think I can do that
> in post-processing.
Sounds like a question for the tabulapdf community ...

> Thanks in advance!
> Gilad

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message