pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Carrier <carr...@digital-evidence.org>
Subject Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead
Date Thu, 12 Feb 2009 18:31:09 GMT
[removing this from jira]

Do you have a suggestion for how PDFBox could most ideally solve your  
situation?  Could you get the needed info by making a class that  
extends PDFTextStripper and overrides processTextPosition()?  Then  
you could see all of the TextPositions and where they are located?

On Feb 11, 2009, at 5:27 PM, Gustavo Hexsel (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/PDFBOX-422? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel&focusedCommentId=12672809#action_12672809 ]
>
> Gustavo Hexsel commented on PDFBOX-422:
> ---------------------------------------
>
> Thanks for the prompt response.
>
> Yes, I saw the methods, they just don't carry the text position  
> anymore (also, blocks get merged).
>
> This is fine, the class is doing what is supposed to (according to  
> its name).  We had a use-case (specifically document redaction)  
> that needed to bring back the text and the associated positions of  
> each char, which we were doing by using the startup of the text  
> block and each individual character width.
>
>
>> Methods are marked as deprecated but they're effectively dead
>> -------------------------------------------------------------
>>
>>                 Key: PDFBOX-422
>>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>>             Project: PDFBox
>>          Issue Type: Bug
>>          Components: Text extraction
>>    Affects Versions: 0.8.0-incubator
>>            Reporter: Gustavo Hexsel
>>
>> There are several methods on PDFTextStripper and PDFStreamEngine  
>> that are marked @deprecated, but they are not really used by the  
>> existing infrastructure anymore.
>> This would be ok if such methods weren't callbacks.  In this case,  
>> it breaks pre-existing code, and prevents the compiler from  
>> letting you know the methods are not to be used anymore.
>> Simply removing the methods would have been a much better solution  
>> in this case.
>> Example of said methods:
>> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
>> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
>> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


Mime
View raw message