pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilad Denneboom <gilad.denneb...@gmail.com>
Subject Re: Extracting text by region
Date Fri, 06 Jun 2014 11:23:54 GMT
Unless otherwise mentioned, all measurements in a PDF are done using
PostScript Points, which are defined as 72 points to one inch.


On Fri, Jun 6, 2014 at 12:59 PM, James Green <james.mk.green@gmail.com>
wrote:

> I am now confused, and I thought this would be a ten minute job...
>
> I have an A4 PDF document. I know the area of the page an address would be
> found. So I created a PDFTextStripperByArea and set a region
> correspondingly, named "Address".
>
> This yielded nothing. Expanding the region randomly showed the address.
> Puzzled, I asked for the media box size and it's a lot larger than the size
> of A4.
>
> So if I'm not expected to provide a region in mm, what does it accept?
>
> Incidentally, the Javadoc
>
> http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/util/PDFTextStripperByArea.html#addRegion(java.lang.String
> ,
> java.awt.geom.Rectangle2D) leads me to wonder what exactly "grouping"
> actually does. I imagined the documentation would read:
>
> "Adds a rectangular area of the page for reading text within."
>
> Thanks,
>
> James
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message