pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gustavo Hexsel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead
Date Tue, 10 Feb 2009 22:11:59 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672426#action_12672426

Gustavo Hexsel commented on PDFBOX-422:

These methods used to be called from the flushPage() method, so we used them as callbacks
since we need the geometry as well as the text in our code.

The new code for PDFTextStripper is more true to its name, it really deals with text and text
only.  The problem is that the methods are still there but they don't get called anymore.
 So, our code compiled but all the text was null (since our extras weren't valid anymore).

It would have been much more useful simply to remove the methods since at least the compiler
would have flagged our code as not being a callback anymore.

We might fork the old PDFTextStripper into a TextGeometryStripper or the like, if I can get
management to approve it (probably not, my contract is up tomorrow an I'm going on vacation

I'll post a patch if we do that.

> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>                 Key: PDFBOX-422
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Gustavo Hexsel
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated,
but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing
code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case. 
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message