Return-Path: X-Original-To: apmail-james-mime4j-dev-archive@minotaur.apache.org Delivered-To: apmail-james-mime4j-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C922D116F2 for ; Tue, 22 Jul 2014 20:23:28 +0000 (UTC) Received: (qmail 76905 invoked by uid 500); 22 Jul 2014 20:23:28 -0000 Delivered-To: apmail-james-mime4j-dev-archive@james.apache.org Received: (qmail 76863 invoked by uid 500); 22 Jul 2014 20:23:28 -0000 Mailing-List: contact mime4j-dev-help@james.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mime4j-dev@james.apache.org Delivered-To: mailing list mime4j-dev@james.apache.org Received: (qmail 76846 invoked by uid 99); 22 Jul 2014 20:23:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jul 2014 20:23:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stan.ieugen@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jul 2014 20:23:26 +0000 Received: by mail-we0-f176.google.com with SMTP id q58so178975wes.21 for ; Tue, 22 Jul 2014 13:23:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=5r8OIFzUr3Bhdohag7MqqiKz65LK3CoKMNZ5d8xQQk4=; b=dCoxX1vN8TDMoyfHcmhw347Ps2SUUT+mbdib2F2fukqHFUZsCnOjYl2K7FGv9RN+R9 Uvpapd8A5fjFw42HCltIucfp1ndaEpvs/tMVFQeq7yO3YIJMTlqI0dyv6ZLyZtrcFXOa C0COw723f0g6rsR2nyqkORlOtdm56UF/mw7pR1gQgLAG6zFyyXKx84TNPSxRl6Q/8BZw 1luccmYFgKiJSlsAEkb6GO9P+Z8O78CSCm/R/lu08BMaHtNDZxCHlu5zcTc1agoF/jH0 /9BBuOj4SaMwqNEOjYn+krCq81FPT6Fp33PYXhMo6TQDBf3y901D3xC+2E+Ztqus8He5 JQgg== MIME-Version: 1.0 X-Received: by 10.180.91.194 with SMTP id cg2mr18426541wib.12.1406060581382; Tue, 22 Jul 2014 13:23:01 -0700 (PDT) Received: by 10.216.237.71 with HTTP; Tue, 22 Jul 2014 13:23:01 -0700 (PDT) In-Reply-To: <53C8EE81.9000206@bitplan.com> References: <53C8EE81.9000206@bitplan.com> Date: Tue, 22 Jul 2014 23:23:01 +0300 Message-ID: Subject: Re: Thunderbird Mailbox support From: Ioan Eugen Stan To: mime4j-dev@james.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hello Wolfgang, I developed MailboxIterator. It's nice to see that it's helpful :) You get that error because MboxIterator does not know how to split the messages. Messages in an mbox file are separated via lines that start with '' From:'. They are called (by me at least) 'From lines' :) . One problem with the mbox format is that it's a bit 'free-form' in the sense that developers abused it and we have some variants [1]. One thing that you could try is to supply a different From line regular expression to MboxIterator via regexpPattern argument. It will split messages based on this new value. [1] http://wiki2.dovecot.org/MailboxFormat/mbox Good luck and please post the your results. Regards, On Fri, Jul 18, 2014 at 12:53 PM, Wolfgang Fahl wrote: > Dear mime4j developers, > > for one of my projects I have been using mime4j successfully to import > e-mail into our CRM database for some two years know. > Currently I am trying to add a feature which would allow reading Mozilla > Thunderbird Mailbox content. > As of mime4j 0.8 there seems to be a MboxIterator which could do that. > Since I didn't find any publicly available source repository which I > could use to access the 0.8-Snapshop I have copied > the three source files: > * CharBufferWrapper.java > * FromLinePatterns.java > * MboxIterator.java > > into my source tree and I am using these together with the following > maven dependency: > > > > org.apache.james > apache-mime4j-core > 0.7.2 > > > org.apache.james > apache-mime4j-dom > 0.7.2 > > > The iterator works somewhat o.k. on some of the Thunderbird mailbox > files and loops thru the mails in it correctly. > The mails can than not be directly parsed with mime4j - there is one > newline at the begining which spoils the show. After > working around this it's working as expected in some cases. In other > cases there is an error: > > java.lang.IllegalArgumentException: File does not contain From_ lines! > Maybe not be a vaild Mbox. > at > org.apache.james.mime4j.mboxiterator.MboxIterator.initMboxIterator(MboxIt= erator.java:85) > at > org.apache.james.mime4j.mboxiterator.MboxIterator.(MboxIterator.jav= a:75) > at > org.apache.james.mime4j.mboxiterator.MboxIterator.(MboxIterator.jav= a:62) > at > org.apache.james.mime4j.mboxiterator.MboxIterator$Builder.build(MboxItera= tor.java:241) > at > com.bitplan.clientutils.ThunderbirdMailArchiveImpl.getMailById(Thunderbir= dMailArchiveImpl.java:386) > at > com.bitplan.clientutils.ThunderbirdMailArchiveImpl.getMailById(Thunderbir= dMailArchiveImpl.java:261) > at > com.bitplan.clientutils.rest.TestMailAccess.testMailById(TestMailAccess.j= ava:77) > > By the way - there is a typo in the above error message "vaild" should > be "valid". > > The error is something I'd like to fix or work-around. > > I have two big user accounts with several hundred mailbox files and some > 300.000 mails from the last 15 years which I'd like > to use as a testcase against which to run the mime4j implementation. > > Would you please supply me with some pointers where I get the necessary > source code and how i could supply patches and > testcases for the project? > > Also it would be good to know whether others would be interested in the > Thunderbird Mailbox reading capability. > > > Cheers > Wolfgang > > -- > > BITPlan - smart solutions > Wolfgang Fahl > Pater-Delp-Str. 1, D-47877 Willich Schiefbahn > Tel. +49 2154 811-480, Fax +49 2154 811-481 > Web: http://www.bitplan.de > BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Gesch= =C3=A4ftsf=C3=BChrer: Wolfgang Fahl > --=20 Ioan Eugen Stan 0720 898 747