james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Gravitt <max.grav...@gmail.com>
Subject Issue Decoding PDF Attachments
Date Sun, 26 Dec 2010 23:40:44 GMT
Hi,

I have an application (running on Google App Engine) that strips attachments from inbound
emails and saves them as a byte[] in the JDO data store.  I think I'm running into a decoding
issue, but I'm unsure of the true issue or the resolution.  I'm finding that for some files,
it embeds equal signs in places where the original document doesn't have any equal signs.
 I've found that MS documents and HTML are rather tolerant of this behavior, but PDFs tend
to get corrupt when this happens.  Also, it doesn't happen with all PDFs and it seems that
it only happens when the attachment is has a transfer encoding of "quoted-printable".

I'm using MimeStreamParser and I extended SimpleContentHandler (bodyDecoded method).  Then,
I use IOUtils.toByteArray(InputStream) to get the bytes that I save.  Any idea of what I may
be missing?  

Below is an example of the contents of a PDF from the "more" command.  You can see the equal
signs from the second representation of the file.

Original file (Good):
1 0 obj
<<
/CreationDate (D:20101203120005)
/Producer (SCS2PDF v1.0 (\251 BeppeCosta, 2005))
/Title (PRINT1)
>>
endobj
2 0 obj
<<
/Type /Catalog
/Pages 3 0 R
>>
endobj

File Snippet After Parsing, Saving, and Retrieving (Bad):
1 0 obj
<<
/CreationDate =
(D:20101203120005)
/Producer (SCS2PDF v1.0 (\251 BeppeCosta, =
2005))
/Title (PRINT1)
>>
endobj
2 0 obj
<<
/Type =
/Catalog
/Pages 3 0 R
>>
endobj

Any thoughts?
thanks!
MG


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message