james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <robertburrelldon...@gmail.com>
Subject Re: Duplicate parsing of some header fields
Date Thu, 05 Feb 2009 19:38:17 GMT
On Wed, Feb 4, 2009 at 1:26 PM, Oleg Kalnichevski <olegk@apache.org> wrote:
> Markus Wiederkehr wrote:
>>
>> On Sun, Feb 1, 2009 at 10:34 PM, Robert Burrell Donkin
>> <robertburrelldonkin@gmail.com> wrote:
>>>
>>> On Fri, Jan 30, 2009 at 7:37 PM, Markus Wiederkehr
>>> <markus.wiederkehr@gmail.com> wrote:
>>>>
>>>> When Mime4j is used to build a DOM some header fields are parsed more
>>>> than once..
>>>
>>> ...i suspected that this might be the case...
>>>
>>>> This is what happens:
>>>>  * AbstractEntity.parseField parses the raw string into a name-value
>>>> pair
>>>>  * AbstractEntity.parseField calls MutableBodyDescriptor.addField for
>>>> valid fields
>>>>  * DefaultBodyDescriptor parses header fields such as Content-Type
>>>>  * MaximalBodyDescriptor parses additional header fields such as
>>>> Mime-Version
>>>>  * eventually MimeStreamParser.parse retrieves the raw field from
>>>> AbstractEntity and notifies a ContentHandler
>>>>  * the ContentHandler (again) has to parse the raw string into name and
>>>> value
>>>>  * the ContentHandler (again) has to parse certain structural fields
>>>> already parsed by a body descriptor
>>>
>>> IIRC the descriptor stuff was added to satisfy requirements for SAX
>>> and pull parsing downstream (in particular IMAP which uses a lot of
>>> MIME meta-data)
>>>
>>>> In case of building a DOM the latter two items are done by
>>>> MessageBuilder by calling Field.parse().
>>>>
>>>> There are several issues here:
>>>>  * the ContentHandler has to do a lot of work that has already been
>>>> done before by AbstractEntity.
>>>
>>> well, probably done before (dependent of parser settings)
>>>
>>>>  * Field.parse() uses a JavaCC-generated parser whereas the body
>>>> descriptors have their own parsing code.
>>>
>>> yes - the body descriptors support a wider variety of fields than the
>>> Field stuff
>>>
>>>>  * we have unnecessary duplicate code and the two parsers are very
>>>> likely inconsistent with each other.
>>>
>>> yes - i trust the code in AbstractEntity more highly then that in Field
>>
>> I'm not so sure.. For example AbstractEntity uses
>> MimeUtil.getHeaderParams() to parse Content-Type and
>> Content-Disposition. It looks like getHeaderParams does not correctly
>> handle comments that are allowed in quoted-strings.. Consider this
>> example:
>>
>>        String test = "text/plain; key=value; foo= (test) \"bar\"";
>>        Map<String, String> headerParams = MimeUtil.getHeaderParams(test);
>>        System.out.println(headerParams);
>>
>> The result is {=text/plain, foo=(test), key=value} which is wrong. The
>> Field equivalent..
>>
>>        ContentTypeField f = Fields.contentType(test);
>>        System.out.println(f.getParameters());
>>
>> ..gives the correct result: {foo=bar, key=value}.
>>
>> But of course we could fix this and use MimeUtil.getHeaderParams() for
>> parsing Fields, too. In fact I would not mind if most of the javacc
>> parsers were replaced by handcrafted equivalents..
>>
>
> Folks
>
> For what is it worth to you. HttpComponents Core provides an extensive
> support for parsing and formatting of MIME headers. The code has been around
> for quite some time, has been relatively well tested, has been extensively
> optimized for performance and memory footprint and is able to operate with
> virtually zero intermediate garbage and no intermediate memory copying.
>
> The parsing/formatting framework is based on the CharArrayBuffer class,
> which is used by MimeTokenStream for low level parsing operations.
>
> http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/HeaderValueParser.java
> http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/BasicHeaderValueParser.java
> http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/HeaderValueFormatter.java
> http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/BasicHeaderValueFormatter.java
>
> Feel free to take a look at the code and see if it might make sense
> replacing MimeUtil.getHeaderParams(value) with some bits from HC.

sounds good - it's good to see that we have quite a few options for
fix this properly in 0.7

- robert

Mime
View raw message