pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Carrier (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PDFBOX-374) text areas not properly being sorted because of page rotation
Date Thu, 18 Sep 2008 13:49:44 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632214#action_12632214

Brian Carrier commented on PDFBOX-374:

Ah, I did not see that. The fix for TextPositionComparator is the same.  The logic in PDFTextStripper
that adjusts the rotation is different though.  I'll look into the differences.  

> text areas not properly being sorted because of page rotation
> -------------------------------------------------------------
>                 Key: PDFBOX-374
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-374
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Brian Carrier
>         Attachments: PDFStreamEngine.diff, PDFTextStripper.diff, rotation.pdf, TextPositionComparator.diff
> When PDFTextStripper is set to sort the text before outputting, the sorting is not correct
if a page rotation exists.  The reason is because both TextPositionComparator and PDFStreamEngine
take the rotation into account.  So, the rotation is applied twice by the time the comparison
is done in TextPositionComparator. 
> Also, it seems that the rotation code in PDFStreamEngine is not consistent. I verified
the code for 0 and 90 degrees works, but the 180 and 270 situations do not seem consistent
with the goal of adjusting the X and Y values so that 0,0 is in the upper left, which is what
the 0 and 90 code does.  I do not have examples of 180 and 270 to test with. There are no
comments in this section, so I have been guessing about its purpose.
> The attached patches:
> - Remove the rotation from TextPositionComparator
> - Adds comments and makes changes to the 180 and 270 situations to make it consistent
with 0 and 90. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message