james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <markus.wiederk...@gmail.com>
Subject Re: Bug in DecoderUtil
Date Mon, 17 Aug 2009 14:29:42 GMT
On Mon, Aug 17, 2009 at 3:22 PM, Aron Wieck<aw@cnt.net> wrote:
>> > assertEquals("Test ü  and more", DecoderUtil.decodeEncodedWords("Test
>> > =?ISO-8859-1?Q?=FC_?= =?ISO-8859-1?Q?and_more?="));
>> Coincidentally the same problem has been reported yesterday by Wim
>> Jongman. Funny how bugs like this can somehow remain undetected for
>> years and then show up all of a sudden..
> This then qualifies as a Schroedinbug:
> http://catb.org/~esr/jargon/html/S/schroedinbug.html


>> > After this fix there is only one space between "ü" and "and", which I
>> > think
>> > is not correct (but I'm not sure).
>> No I think one space would be correct, see MIME4J-104.
> My bad! Sorry.
>> > Proposed Solution:
>> >
>> > Replace "indexOf" by Regex matching, like so:
>> > [...]
>> I'm afraid that would reintroduce MIME4J-104..
> If you are interested I could write a regex based version which will not
> reintroduce the double space bug.
> I'ld use the regex to extract charset, encoding and encoded string in one
> go. I think it will be at least as fast as the current method.
> However, java.util.regex requires Java 1.4, if that's a no-go I won't
> bother.

Regex wouldn't be a problem since Mime4j already depends on Java 5.

I'm not sure how a regex solution could compete with a few indexOf and
substring calls in terms of speed though. I mean Pattern.compile()
alone has to build a DFA from the input string.

I'd like to give it a try by refactoring and fixing the existing code.


> Thanks for your quick response.

View raw message