pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: (pdffile) does not allow extracting content
Date Tue, 23 Feb 2016 21:22:03 GMT
Am 23.02.2016 um 22:19 schrieb Brzrk One:
> I get all that. I just don't see where in loadNonSeq() it is refusing to
> copy content.

Not at all. It is really just a warning when opening.

The refusal is in the command line utility source.

Tilman

>
> On Tue, Feb 23, 2016 at 3:40 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Am 23.02.2016 um 21:33 schrieb Brzrk One:
>>
>>> loadNonSeq() seems to respect it too?
>>> sneaking in a bogus AccessPermissions.canExtractContent() did not alter
>>> this.
>>> What does load() do that loadNonSeq() does not? (Or vice versa.)
>>>
>> They use different parsing strategies. Additionally, a difference is that
>> loadNonSeq immediately decrypts, and brings up the warning.
>>
>>
>> Tilman
>>
>>
>>> On Tue, Feb 23, 2016 at 2:54 PM, Tilman Hausherr <THausherr@t-online.de>
>>> wrote:
>>>
>>> Am 23.02.2016 um 20:44 schrieb Brzrk One:
>>>> The file is:
>>>>> http://www.bmv.com.mx/docs-pub/infoifrs/infoifrs_588674_2015-01_1.pdf
>>>>>
>>>>> The file is indeed protected against text extraction. Our command line
>>>> utilities respect this. The methods (of PDFTextStripper) ignore it, they
>>>> expect you to handle it. See in the examples source code how to extract
>>>> text.
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>> On Tue, Feb 23, 2016 at 12:05 PM, Tilman Hausherr <THausherr@t-online.de
>>>>> wrote:
>>>>>
>>>>> Am 23.02.2016 um 17:53 schrieb Brzrk One:
>>>>>
>>>>>> With pdfbox-1.8.11, using the bottom-up parser (loadNonSeq) on a
>>>>>> document
>>>>>>
>>>>>>> that has security ContentCopying: NotAllowed results in:
>>>>>>>
>>>>>>> org.apache.pdfbox.pdfparser.NonSequentialPDFParser - PDF file
>>>>>>> 'some_temp_file.pdf' does not allow extracting content
>>>>>>>
>>>>>>> And the output pages are all blank.
>>>>>>>
>>>>>>> The top-down parser (load) has no such issue.
>>>>>>>
>>>>>>> Is there a workaround?
>>>>>>>
>>>>>>>
>>>>>>> I looked in the source code, this warning comes only in the non
>>>>>>>
>>>>>> sequential
>>>>>> parser. There's a similar error message in the ExtractText command
line
>>>>>> utility ("You do not have permission to extract text").
>>>>>>
>>>>>> The best would be to upload the file somewhere.
>>>>>>
>>>>>> Tilman
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message