pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fx YAN BING <yan.b...@fujixerox.co.jp>
Subject RE: Text extraction and clip area
Date Thu, 01 Dec 2016 09:22:48 GMT
Hi, this is Yan from Japan.
I'm also a user of PDFBox.

About your problem, I've not understood clearly.
Do you want to process the contents inside a form?

I can give a sample code used in my project.
It use PDFStreamEngine to get form objects in PDF.
I hope it can help you.


-----Original Message-----
From: Andrea Vacondio [mailto:andrea.vacondio@gmail.com] 
Sent: Thursday, December 1, 2016 6:02 PM
To: users@pdfbox.apache.org
Subject: Text extraction and clip area

Hi, I had a couple of issues with text extraction and I tried to dig a bit into the code.
As far as I can see the "current clipping area" is never used during text extraction, is this
correct? My issue is with a form xobject where the bounding box clips out part of the text
but that text is returned by the text stripper.
View raw message