pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From A...@swmc.com
Subject Re: Do we should be able to extract text from ownter-password protected pdf file?
Date Mon, 31 Aug 2009 18:17:09 GMT
I tested you patch and confirmed that this does NOT work for encrypted 
files.  Here's the stacktrace:

Exception in thread "main" 
org.apache.pdfbox.exceptions.CryptographyException: Error: The supplied 
password does not match either the owner or user password in the document.
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:184)

In case my line numbers are off, line 184 is: document.openProtection( sdm 
); which happens before the lines which were commented out by your patch.

I believe you're saying that the text can be extracted from password 
protected, non-encrypted files.  If it's possible to password protect PDFs 
without using encryption, that's news to me.   I'm not sure what the point 
would be of password protecting something if you're not going to encrypt 
it, since that would only give a false sense of security, not any actual 
security.  So, I just wanted to clear that up so people don't read your 
post and think that all PDF security is completely broken.  When I first 
read it, I thought you were implying that any password protected document 
could be read without the password.

As for whether we "should" be able to do this or not, I'd say the 
ExtractText program which comes with PDFBox should respect the permissions 
by default, and perhaps have an option to extract password protected, 
unencrypted documents (without a password).  I'm not sure what one would 
call that option... -bypassPassword ?


"Takashi Komatsubara" <takashi.smi@gmail.com> 
08/31/2009 04:05
Please respond to


Do we should be able to extract text from ownter-password protected pdf 

Hi team,

Technically, we can do extract text from "Owner" password protected pdf 
without specifing "owner" password. Right?

Do we should be able to do that ? or not.

The reason why I'm asking is I am using the PDFBox for audting the content 

of the pdf file.
So, whether the user want to make "text extract" permission disabled or 
I need to look into the content of the "owner password" protected pdf 

Old PDFbox could do this.

What do you think?


?  Click here to submit conditions  

This email and any content within or attached hereto from  Sun West Mortgage Company, Inc.
 is confidential and/or legally privileged. The information is intended only for the use of
the individual or entity named on this email. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or the taking of any action in
reliance on the contents of this email information is strictly prohibited, and that the documents
should be returned to this office immediately by email. Receipt by anyone other than the intended
recipient is not a waiver of any privilege. Please do not include your social security number,
account number, or any other personal or financial information in the content of the email.
Should you have any questions, please call  (800) 453 7884.   
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message