james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman Maurer <nor...@apache.org>
Subject Re: Issue Decoding PDF Attachments
Date Mon, 27 Dec 2010 20:43:46 GMT
Maybe you could provide a test case ? An email which reproduce it
would help to..

Bye,
Norman


2010/12/27 Max Gravitt <max.gravitt@gmail.com>:
> Hi,
>
> Yes, I am using version 0.6.
>
> thanks
> Max
>
> On Dec 27, 2010, at 2:17 PM, Norman Maurer wrote:
>
>> Hi there,
>>
>> what version of mime4j ?
>>
>> Bye
>> Norman
>>
>>
>> 2010/12/27 Max Gravitt <max.gravitt@gmail.com>:
>>> Hi,
>>>
>>> I have an application (running on Google App Engine) that strips attachments
from inbound emails and saves them as a byte[] in the JDO data store.  I think I'm running
into a decoding issue, but I'm unsure of the true issue or the resolution.  I'm finding that
for some files, it embeds equal signs in places where the original document doesn't have any
equal signs.  I've found that MS documents and HTML are rather tolerant of this behavior,
but PDFs tend to get corrupt when this happens.  Also, it doesn't happen with all PDFs and
it seems that it only happens when the attachment is has a transfer encoding of "quoted-printable".
>>>
>>> I'm using MimeStreamParser and I extended SimpleContentHandler (bodyDecoded method).
 Then, I use IOUtils.toByteArray(InputStream) to get the bytes that I save.  Any idea of
what I may be missing?
>>>
>>> Below is an example of the contents of a PDF from the "more" command.  You can
see the equal signs from the second representation of the file.
>>>
>>> Original file (Good):
>>> 1 0 obj
>>> <<
>>> /CreationDate (D:20101203120005)
>>> /Producer (SCS2PDF v1.0 (\251 BeppeCosta, 2005))
>>> /Title (PRINT1)
>>>>>
>>> endobj
>>> 2 0 obj
>>> <<
>>> /Type /Catalog
>>> /Pages 3 0 R
>>>>>
>>> endobj
>>>
>>> File Snippet After Parsing, Saving, and Retrieving (Bad):
>>> 1 0 obj
>>> <<
>>> /CreationDate =
>>> (D:20101203120005)
>>> /Producer (SCS2PDF v1.0 (\251 BeppeCosta, =
>>> 2005))
>>> /Title (PRINT1)
>>>>>
>>> endobj
>>> 2 0 obj
>>> <<
>>> /Type =
>>> /Catalog
>>> /Pages 3 0 R
>>>>>
>>> endobj
>>>
>>> Any thoughts?
>>> thanks!
>>> MG
>>>
>>>
>
>

Mime
View raw message