james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Charles <e...@apache.org>
Subject Re: Thunderbird Mailbox support (patch included)
Date Sun, 03 Aug 2014 07:52:24 GMT
Could you open on JIRA on https://issues.apache.org/jira/browse/MIME4J
and upload there your patch? Thx.

On 07/23/2014 09:57 AM, Wolfgang Fahl wrote:
> Hi Ioan Eugen,
> 
> please find attached a patch.
> 
> it uses the following fromline pattern:
> static final String DEFAULT = "^From \\S+.*\\d{4}$";
> so that it matches more lines.
> 1. From ieugen@apache.org Fri Sep 09 14:04:52 2011
> 2. From MAILER-DAEMON Wed Oct 05 21:54:09 2011
> 3. From - Wed Apr 02 06:51:08 2014
> 
> so looking for an "@" sign is not enforced any more.
> 
> The patch fixes a typo:
> -    private Matcher fromLineMathcer;
> +    private Matcher fromLineMatcher;
> 
> in many places of the source code.
> 
> It adds a reference to the original mbox File so that the error message:
> +                 if (mbox!=null)
> +                       path=mbox.getPath();
> +            throw new IllegalArgumentException("File "+path+" does not
> contain From_ lines that match the pattern
> '"+MESSAGE_START.pattern()+"'! Maybe not be a valid Mbox.");
> 
> can be improved.
> 
> Who is going to check this patch and what needs to be done to get it
> into the official repo?
> I would also like to add more test cases and especially include some
> dummy mboxes. And as mentioned I'd like to check the iterator against
> all my Thunderbird mboxes to check
> whether it will successfully parse them all. Also I am offering to write
> a few "tutorial lines". Where would I have to put these?
> 
> Cheers
>   Wolfgang
> 
> Am 22.07.14 22:23, schrieb Ioan Eugen Stan:
>> Hello Wolfgang,
>>
>> I developed MailboxIterator. It's nice to see that it's helpful :)
>>
>> You get that error because MboxIterator does not know how to split the
>> messages. Messages in an mbox file are separated via lines that start
>> with '' From:'. They are called (by me at least) 'From lines' :) .
>> One problem with the mbox format is that it's a bit 'free-form' in the
>> sense that developers abused it and we have some variants [1].
>>
>> One thing that you could try is to supply a different From line
>> regular expression to MboxIterator via regexpPattern argument. It will
>> split messages based on this new value.
>>
>> [1] http://wiki2.dovecot.org/MailboxFormat/mbox
>>
>> Good luck and please post the your results.
>>
>> Regards,
>>
>> On Fri, Jul 18, 2014 at 12:53 PM, Wolfgang Fahl <wf@bitplan.com> wrote:
>>> Dear mime4j developers,
>>>
>>> for one of my projects I have been using mime4j successfully to import
>>> e-mail into our CRM database for some two years know.
>>> Currently I am trying to add a feature which would allow reading Mozilla
>>> Thunderbird Mailbox content.
>>> As of mime4j 0.8 there seems to be a MboxIterator which could do that.
>>> Since I didn't find any publicly available source repository which I
>>> could use to access the 0.8-Snapshop I have copied
>>> the three source files:
>>> * CharBufferWrapper.java
>>> * FromLinePatterns.java
>>> * MboxIterator.java
>>>
>>> into my source tree and I am using these together with the following
>>> maven dependency:
>>>
>>> <!-- EMail handling -->
>>>         <dependency>
>>>             <groupId>org.apache.james</groupId>
>>>             <artifactId>apache-mime4j-core</artifactId>
>>>             <version>0.7.2</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>org.apache.james</groupId>
>>>             <artifactId>apache-mime4j-dom</artifactId>
>>>             <version>0.7.2</version>
>>>         </dependency>
>>>
>>> The iterator works somewhat o.k. on some of the Thunderbird mailbox
>>> files and loops thru the mails in it correctly.
>>> The mails can than not be directly parsed with mime4j - there is one
>>> newline at the begining which spoils the show. After
>>> working around this it's working as expected in some cases. In other
>>> cases there is an error:
>>>
>>> java.lang.IllegalArgumentException: File does not contain From_ lines!
>>> Maybe not be a vaild Mbox.
>>>     at
>>> org.apache.james.mime4j.mboxiterator.MboxIterator.initMboxIterator(MboxIterator.java:85)
>>>     at
>>> org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:75)
>>>     at
>>> org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:62)
>>>     at
>>> org.apache.james.mime4j.mboxiterator.MboxIterator$Builder.build(MboxIterator.java:241)
>>>     at
>>> com.bitplan.clientutils.ThunderbirdMailArchiveImpl.getMailById(ThunderbirdMailArchiveImpl.java:386)
>>>     at
>>> com.bitplan.clientutils.ThunderbirdMailArchiveImpl.getMailById(ThunderbirdMailArchiveImpl.java:261)
>>>     at
>>> com.bitplan.clientutils.rest.TestMailAccess.testMailById(TestMailAccess.java:77)
>>>
>>> By the way - there is a typo in the above error message "vaild" should
>>> be "valid".
>>>
>>> The error is something I'd like to fix or work-around.
>>>
>>> I have two big user accounts with several hundred mailbox files and some
>>> 300.000 mails from the last 15 years which I'd like
>>> to use as a testcase against which to run the mime4j implementation.
>>>
>>> Would you please supply me with some pointers where I get the necessary
>>> source code and how i could supply patches and
>>> testcases for the project?
>>>
>>> Also it would be good to know whether others would be interested in the
>>> Thunderbird Mailbox reading capability.
>>>
>>>
>>> Cheers
>>>   Wolfgang
>>>
>>> --
>>>
>>> BITPlan - smart solutions
>>> Wolfgang Fahl
>>> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
>>> Tel. +49 2154 811-480, Fax +49 2154 811-481
>>> Web: http://www.bitplan.de
>>> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer:
Wolfgang Fahl
>>>
>>
>>
> 

Mime
View raw message