pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Hegde <ravihe...@hotmail.com>
Subject RE: Text bounding box
Date Mon, 12 Nov 2012 03:43:17 GMT

This code is not clean as I am hacking around to learn pdfbox capabilities. For now I introduced
code in PageDrawer.processTextPosition(TextPosition text) method to draw the bounding box
to visually inspect the bounds of the characters drawn. I have proposed a change to PDPage
class to support this case (https://issues.apache.org/jira/browse/PDFBOX-1444).

public void processTextPosition( TextPosition text )
        //Existing processTextPosition method implementation code

        //Temporary change at the end of the method.
        System.out.println("Font:" + text.getFont());
        graphics.setColor( Color.BLACK );
                new java.awt.geom.Rectangle2D.Double(
                        text.getX(), text.getY() - text.getHeight(),
                        text.getWidthDirAdj(), text.getHeight()

I have attached the sample PDF document and the first page drawn by the PageDrawer change.
Many characters are rendered outside the box drawn by PageDrawer. Goal is to find the bounding
box around the character.

Tracing TextPosition.getFont() to console for the attached PDF document prints org.apache.pdfbox.pdmodel.font.PDType1Font.

Thanks for the help!

> Date: Fri, 9 Nov 2012 10:06:48 -0800
> Subject: Re: Text bounding box
> From: duane@technoracle-systems.com
> To: users@pdfbox.apache.org
> Can you post a sample of your code so we can look at it?  If you trace
> getFont() to console, what does it return?
> Duane Nickull
> ***********************************
> Technoracle Advanced Systems Inc.
> Consulting and Contracting; Proven Results!
> i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
> b. http://technoracle.blogspot.com
> t.  @duanechaos
> "Don't fear the Graph!  Embrace Neo4J"
> On 2012-11-08 10:03 PM, "Ravi Hegde" <ravihegde@hotmail.com> wrote:
> >
> >I am using pdfbox for detecting the text areas in a PDF document. I am
> >able to detect the main text area. It includes the text height between
> >the base line and ascent. However, characters with descents are painted
> >below the base line. Similarly some characters are painted above the
> >ascent line. I want the box including the descent and areas above the
> >ascent line of the character. I tried
> >TextPosition.getFont().getFontBoundingBox() and
> >TextPosition.getFont().getFontDescriptor() without luck. Please help me
> >to find the best way to detect the complete bounding box around the
> >character.
> >
> >-Thanks for the help.
> >
> > 		 	   		  
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message