james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bagnara <apa...@bago.org>
Subject Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly
Date Mon, 16 Feb 2009 13:49:28 GMT
Markus Wiederkehr ha scritto:
> In my opinion this issue is closely related to MIME4J-112 and MIME4J-116.
> 
> I think that in the course of MIME4J-116 we should (maybe) create
> Field instances in AbstractEntity instead of later on in
> MessageBuilder. A Field object could store the raw data in a byte[]
> instead of a String which would greatly help with MIME4J-112.
> 
> The only problem is that the charset for a lenient parsing mode is not
> known at this early point. But considering your clarification about
> the lenient writing mode I wonder if anybody really needs a lenient
> parsing mode. (I wonder if anyone really needs a lenient writing mode
> for that matter.)

Lenient Writing IMO is only needed if you need roundtrip. For
standard/most MIME4J usages I don't see why we should write malformed
data in output.

Lenient reading instead is part of  being a generic parsing library:
most email clients correctly handle 8bit chars in the Subject header
because it happens than some email client writes them unencoded. If you
think mime4j could be used as the library for an email client it
probably still worth handling 8bit chars in the headers.
Of course there is no need to implement such a feature until someone
really ask/need it.

I don't really know nowadays how many email messages contains unencoded
headers. 10 years ago, when I checked this stuff deeply almost 40% of
international emails included unencoded headers. I expect this
percentage to be much less today, but I don't know if it is 10% or 0.1%.

Stefano

> So maybe AbstractEntity should simply use US-ASCII to decode the
> header fields without direct support for a lenient parsing mode that
> nobody needs. Then AbstractEntity can build Field instances and a
> ContentHandler receives those Field instances without having to parse
> them again.
> 
> All in all I'm not sure if #118 should be addressed independently of
> 112 and 116 and whether 118 should be targeted for 0.6..
> 
> But those are just my 2 cents,
> 
> Markus
> 
> 
> On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA)
> <mime4j-dev@james.apache.org> wrote:
>>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
>>
>> Oleg Kalnichevski reassigned MIME4J-118:
>> ----------------------------------------
>>
>>    Assignee: oleg.kalnichevski
>>
>> Working on a patch
>>
>> Oleg
>>
>>> MIME stream parser handles non-ASCII fields incorrectly
>>> -------------------------------------------------------
>>>
>>>                 Key: MIME4J-118
>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>             Project: JAMES Mime4j
>>>          Issue Type: Bug
>>>            Reporter: Oleg Kalnichevski
>>>            Assignee: oleg.kalnichevski
>>>             Fix For: 0.6
>>>
>>>
>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field
content gets converted to its textual representation too early in the parsing process using
simple byte to char cast. The decision about appropriate char encoding should be left up to
individual ContentHandler implementations.
>>> Oleg
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
> 


Mime
View raw message