pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilija Pavlic <ilija.pav...@gmail.com>
Subject PDFTextStripperByArea coordinates
Date Wed, 04 Jan 2012 11:53:23 GMT
I am having issues with coordinates. The PDFTextStripperByArea region
seems to be pushed too high.

Consider the following example snippet:

...
    PDPage page = (PDPage) allPages.get(0);
    PDFTextStripperByArea stripper = new PDFTextStripperByArea();

    // define region for extraction -- the coordinates and dimensions
are x, y, width, height
    Rectangle region = new Rectangle((int) x, (int)y, (int)width, (int)height);
    stripper.addRegion("test region", region);

    // overlay the region with a cyan rectangle to check if I got the
coordinates and dimensions right
    PDPageContentStream contentStream = new
PDPageContentStream(document, page, true, true);
    contentStream.setNonStrokingColor( Color.CYAN );
    contentStream.fillRect( (int)x, (int)y, (int)width, (int)height );
    contentStream.close();

    // extract the text from the defined region
    stripper.extractRegions(page);
    String content = stripper.getTextForRegion("test region");
...
    document.save(...);
...

The cyan rectangle overlays the desired region nicely. On the other
hand, stripper misses a couple of lines at the bottom of the rectangle
and includes couple of lines above the rectangle. What is going on?

Thank you,
Ilija.

Mime
View raw message