pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Green <james.mk.gr...@gmail.com>
Subject Extracting text by region
Date Fri, 06 Jun 2014 10:59:44 GMT
I am now confused, and I thought this would be a ten minute job...

I have an A4 PDF document. I know the area of the page an address would be
found. So I created a PDFTextStripperByArea and set a region
correspondingly, named "Address".

This yielded nothing. Expanding the region randomly showed the address.
Puzzled, I asked for the media box size and it's a lot larger than the size
of A4.

So if I'm not expected to provide a region in mm, what does it accept?

Incidentally, the Javadoc
http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/util/PDFTextStripperByArea.html#addRegion(java.lang.String,
java.awt.geom.Rectangle2D) leads me to wonder what exactly "grouping"
actually does. I imagined the documentation would read:

"Adds a rectangular area of the page for reading text within."

Thanks,

James

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message