pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Hasan Romero (JIRA)" <j...@apache.org>
Subject [jira] Created: (PDFBOX-495) PDFTextStripperByArea extracts text only from 1 region, despite several regions being defined
Date Thu, 23 Jul 2009 10:19:14 GMT
PDFTextStripperByArea extracts text only from 1 region, despite several regions being defined
---------------------------------------------------------------------------------------------

                 Key: PDFBOX-495
                 URL: https://issues.apache.org/jira/browse/PDFBOX-495
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 0.8.0-incubator
         Environment: Debian, java SE 6
            Reporter: Ismael Hasan Romero


When trying to extract the text from several areas defined in the PDFTextStripperByArea, 
it only
retrieves the text from one. The problem can be seen with the following steps: 

Divide a page in 4 regions and add the regions to the stripper in
the following order:
1-upper left, 2-upper right, 3-lower left, 4-lower right.

After calling "extractRegions" function, only the text for the third
one is retrieved.

If the third region is not added (i.e., only regions 1, 2 and 4 are added), only the text
for region 2 is retrieved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message