james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: [jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly
Date Tue, 17 Feb 2009 19:36:41 GMT
Markus Wiederkehr wrote:
> Let me reply to the mailing list to keep the discussion on the detail
> level away from the JIRA..
> On Tue, Feb 17, 2009 at 12:29 PM, Oleg Kalnichevski (JIRA)
> <mime4j-dev@james.apache.org> wrote:
>>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> Oleg Kalnichevski updated MIME4J-118:
>> -------------------------------------
>>    Attachment: mime4j-118.patch
>> Here is the first cut at fixing the problem with charset coding in MIME fields. The
significant changes are
>> (1) EntityStateMachine#getField() returns ByteArrayBuffer instead of a String
>> (2) ContentHandler#field takes ByteArrayBuffer instead of a String
> Your patch does not look bad, but:
> I have a feeling that it will not help with MIME4J-116, maybe on the
> contrary. You say you do not want AbstractEntity to create Field
> instances yet you give no rationalization for your opinion.. 

I believe I did, but I can certainly re-iterate.

Tight coupling is believed to be bad, I _personally_ tend to agree. By 
coupling EntityStateMachine with the Field class we will end up coupling 
it, albeit indirectly, with pretty much the entire 
org.apache.james.mime4j.field package

EntityStateMachine -> Field -> DefaultFieldParser -> several concrete 
implementations of FieldParser

This is a very _bad_ idea, in my opinion.

I could live with it, if Field were an interface.

I don't
> see how the ByteArrayBuffer can help with the unnecessary duplicate
> field parsing.

It was not meant to. To me, these two issues are related, but not the same.

> I have no problem with small changes one at a time. But it does not
> hurt to keep an eye on related issues in the process.
> Also I don't like that ByteArrayBuffer is a mutable class (and yes, I
> am aware that byte[] is mutable too). Please consider that Field is
> designed to be immutable. This is an important aspect because a Field
> may be shared between multiple messages. So if you refactor Field to
> store a ByteArrayBuffer please make sure that the ByteArrayBuffer does
> not get exposed publicly.
> Maybe we should introduce an immutable object that holds a byte array;
> similar to what String is for a character array. That would still not
> help with #116 though..

Yes, this is a good idea, but in this case the class would have to make 
a copy of the byte array _every_ time someone wanted to get access to 
the underlying raw content, for instance, in order to parse it using a 
different parser or write it out to an OutputStream. This is the only 
way I can see to ensure immutability of such a class. Interestingly 
enough, this can get quite expansive if occurs frequently.

We solved a similar issue in HttpCore in the following way.

We have Header interface that represents an immutable HTTP header and we 
have FormattedHeader interface extending Header that provides access to 
the underlying CharArrayBuffer. One would have to cast Header to 
FormattedHeader in order to be able to mutate its content.


> Markus
>> If these changes are okay with everyone I'll proceed with refactoring and change
Field and friends to utilize ByteArrayBuffer for storing raw field content.
>> Oleg
>>> MIME stream parser handles non-ASCII fields incorrectly
>>> -------------------------------------------------------
>>>                 Key: MIME4J-118
>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>             Project: JAMES Mime4j
>>>          Issue Type: Bug
>>>            Reporter: Oleg Kalnichevski
>>>            Assignee: Oleg Kalnichevski
>>>             Fix For: 0.6
>>>         Attachments: mime4j-118.patch
>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field
content gets converted to its textual representation too early in the parsing process using
simple byte to char cast. The decision about appropriate char encoding should be left up to
individual ContentHandler implementations.
>>> Oleg
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.

View raw message