pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Green <james.mk.gr...@gmail.com>
Subject Re: Extracting text by region
Date Fri, 06 Jun 2014 13:57:12 GMT
Thank you - works now.


On 6 June 2014 12:23, Gilad Denneboom <gilad.denneboom@gmail.com> wrote:

> Unless otherwise mentioned, all measurements in a PDF are done using
> PostScript Points, which are defined as 72 points to one inch.
>
>
> On Fri, Jun 6, 2014 at 12:59 PM, James Green <james.mk.green@gmail.com>
> wrote:
>
> > I am now confused, and I thought this would be a ten minute job...
> >
> > I have an A4 PDF document. I know the area of the page an address would
> be
> > found. So I created a PDFTextStripperByArea and set a region
> > correspondingly, named "Address".
> >
> > This yielded nothing. Expanding the region randomly showed the address.
> > Puzzled, I asked for the media box size and it's a lot larger than the
> size
> > of A4.
> >
> > So if I'm not expected to provide a region in mm, what does it accept?
> >
> > Incidentally, the Javadoc
> >
> >
> http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/util/PDFTextStripperByArea.html#addRegion(java.lang.String
> > ,
> > java.awt.geom.Rectangle2D) leads me to wonder what exactly "grouping"
> > actually does. I imagined the documentation would read:
> >
> > "Adds a rectangular area of the page for reading text within."
> >
> > Thanks,
> >
> > James
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message