james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stan Ioan Eugen ...@ieugen.ro>
Subject Re: Thunderbird Mailbox support (patch included)
Date Sun, 10 Aug 2014 08:33:49 GMT
Hello Wolfgang,

Sorry for my late reply.  I've created a Jira ticket to track this
issue. As Eric suggested, it's the right way to do get code into the
project.
I've looked over the code and it looks good in general. I would keep
both variants of the regular expression to match FROM lines, with  a
good  javadoc, so users can use any of them in their code. I would
also move the 'mbox != null' check inside the constructor - this way
we make sure we don't create an object in an inconsistent state.

I will be more than happy to push the patch upstream once we have some
tests for the new behavior. Are you interested in providing the tests?

Please use the issue for patch submission and relevant comments.
https://issues.apache.org/jira/browse/MIME4J-242

Thanks,


2014-08-03 10:52 GMT+03:00 Eric Charles <eric@apache.org>:
> Could you open on JIRA on https://issues.apache.org/jira/browse/MIME4J
> and upload there your patch? Thx.
>
> On 07/23/2014 09:57 AM, Wolfgang Fahl wrote:
>> Hi Ioan Eugen,
>>
>> please find attached a patch.
>>
>> it uses the following fromline pattern:
>> static final String DEFAULT = "^From \\S+.*\\d{4}$";
>> so that it matches more lines.
>> 1. From ieugen@apache.org Fri Sep 09 14:04:52 2011
>> 2. From MAILER-DAEMON Wed Oct 05 21:54:09 2011
>> 3. From - Wed Apr 02 06:51:08 2014
>>
>> so looking for an "@" sign is not enforced any more.
>>
>> The patch fixes a typo:
>> -    private Matcher fromLineMathcer;
>> +    private Matcher fromLineMatcher;
>>
>> in many places of the source code.
>>
>> It adds a reference to the original mbox File so that the error message:
>> +                 if (mbox!=null)
>> +                       path=mbox.getPath();
>> +            throw new IllegalArgumentException("File "+path+" does not
>> contain From_ lines that match the pattern
>> '"+MESSAGE_START.pattern()+"'! Maybe not be a valid Mbox.");
>>
>> can be improved.
>>
>> Who is going to check this patch and what needs to be done to get it
>> into the official repo?
>> I would also like to add more test cases and especially include some
>> dummy mboxes. And as mentioned I'd like to check the iterator against
>> all my Thunderbird mboxes to check
>> whether it will successfully parse them all. Also I am offering to write
>> a few "tutorial lines". Where would I have to put these?
>>
>> Cheers
>>   Wolfgang
>>
>> Am 22.07.14 22:23, schrieb Ioan Eugen Stan:
>>> Hello Wolfgang,
>>>
>>> I developed MailboxIterator. It's nice to see that it's helpful :)
>>>
>>> You get that error because MboxIterator does not know how to split the
>>> messages. Messages in an mbox file are separated via lines that start
>>> with '' From:'. They are called (by me at least) 'From lines' :) .
>>> One problem with the mbox format is that it's a bit 'free-form' in the
>>> sense that developers abused it and we have some variants [1].
>>>
>>> One thing that you could try is to supply a different From line
>>> regular expression to MboxIterator via regexpPattern argument. It will
>>> split messages based on this new value.
>>>
>>> [1] http://wiki2.dovecot.org/MailboxFormat/mbox
>>>
>>> Good luck and please post the your results.
>>>
>>> Regards,
>>>
>>> On Fri, Jul 18, 2014 at 12:53 PM, Wolfgang Fahl <wf@bitplan.com> wrote:
>>>> Dear mime4j developers,
>>>>
>>>> for one of my projects I have been using mime4j successfully to import
>>>> e-mail into our CRM database for some two years know.
>>>> Currently I am trying to add a feature which would allow reading Mozilla
>>>> Thunderbird Mailbox content.
>>>> As of mime4j 0.8 there seems to be a MboxIterator which could do that.
>>>> Since I didn't find any publicly available source repository which I
>>>> could use to access the 0.8-Snapshop I have copied
>>>> the three source files:
>>>> * CharBufferWrapper.java
>>>> * FromLinePatterns.java
>>>> * MboxIterator.java
>>>>
>>>> into my source tree and I am using these together with the following
>>>> maven dependency:
>>>>
>>>> <!-- EMail handling -->
>>>>         <dependency>
>>>>             <groupId>org.apache.james</groupId>
>>>>             <artifactId>apache-mime4j-core</artifactId>
>>>>             <version>0.7.2</version>
>>>>         </dependency>
>>>>         <dependency>
>>>>             <groupId>org.apache.james</groupId>
>>>>             <artifactId>apache-mime4j-dom</artifactId>
>>>>             <version>0.7.2</version>
>>>>         </dependency>
>>>>
>>>> The iterator works somewhat o.k. on some of the Thunderbird mailbox
>>>> files and loops thru the mails in it correctly.
>>>> The mails can than not be directly parsed with mime4j - there is one
>>>> newline at the begining which spoils the show. After
>>>> working around this it's working as expected in some cases. In other
>>>> cases there is an error:
>>>>
>>>> java.lang.IllegalArgumentException: File does not contain From_ lines!
>>>> Maybe not be a vaild Mbox.
>>>>     at
>>>> org.apache.james.mime4j.mboxiterator.MboxIterator.initMboxIterator(MboxIterator.java:85)
>>>>     at
>>>> org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:75)
>>>>     at
>>>> org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:62)
>>>>     at
>>>> org.apache.james.mime4j.mboxiterator.MboxIterator$Builder.build(MboxIterator.java:241)
>>>>     at
>>>> com.bitplan.clientutils.ThunderbirdMailArchiveImpl.getMailById(ThunderbirdMailArchiveImpl.java:386)
>>>>     at
>>>> com.bitplan.clientutils.ThunderbirdMailArchiveImpl.getMailById(ThunderbirdMailArchiveImpl.java:261)
>>>>     at
>>>> com.bitplan.clientutils.rest.TestMailAccess.testMailById(TestMailAccess.java:77)
>>>>
>>>> By the way - there is a typo in the above error message "vaild" should
>>>> be "valid".
>>>>
>>>> The error is something I'd like to fix or work-around.
>>>>
>>>> I have two big user accounts with several hundred mailbox files and some
>>>> 300.000 mails from the last 15 years which I'd like
>>>> to use as a testcase against which to run the mime4j implementation.
>>>>
>>>> Would you please supply me with some pointers where I get the necessary
>>>> source code and how i could supply patches and
>>>> testcases for the project?
>>>>
>>>> Also it would be good to know whether others would be interested in the
>>>> Thunderbird Mailbox reading capability.
>>>>
>>>>
>>>> Cheers
>>>>   Wolfgang
>>>>
>>>> --
>>>>
>>>> BITPlan - smart solutions
>>>> Wolfgang Fahl
>>>> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
>>>> Tel. +49 2154 811-480, Fax +49 2154 811-481
>>>> Web: http://www.bitplan.de
>>>> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer:
Wolfgang Fahl
>>>>
>>>
>>>
>>



-- 
Ioan Eugen Stan / ieugen.ro

Mime
View raw message