Return-Path: Delivered-To: apmail-uima-user-archive@www.apache.org Received: (qmail 72251 invoked from network); 14 Feb 2011 09:45:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Feb 2011 09:45:07 -0000 Received: (qmail 78110 invoked by uid 500); 14 Feb 2011 09:45:06 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 77885 invoked by uid 500); 14 Feb 2011 09:45:04 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 77875 invoked by uid 99); 14 Feb 2011 09:45:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Feb 2011 09:45:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tommaso.teofili@gmail.com designates 209.85.213.175 as permitted sender) Received: from [209.85.213.175] (HELO mail-yx0-f175.google.com) (209.85.213.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Feb 2011 09:44:55 +0000 Received: by yxd5 with SMTP id 5so1942639yxd.6 for ; Mon, 14 Feb 2011 01:44:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=TRkIl3WAdKWPdn82gAgV8Vu543YLZb1OMu9TCzY8f78=; b=Ane23UP0Pp4lqXTubWNCwwSnMlvntW0D9kIU2lBnSB2/fhazqvkzanNS+RGcGS4IOr nzSKJUhfjtdqUzJ+/IUPuMcMFZ4YbsSzwoKZ31uY8UGpypRMPX5jrbcLj7vcaKYfuOay PhSuZwMWXI/6FAqd7chzKpKTfUAKiEPmNR/ME= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=d2SR1jtXHMIvWL6y7HpVWjlgnzKuRLaSbYobtf2arDaDF6o0LGGMUlV2/ZjNWhRF/n 0NrhEHWIiDwO5gSm/J/ImYxn9yW/vhhPb55gdPgRWJC4+Q/N3wgYaVmVNkpQL8A1L7yl URM0OiaWPlfZfht4PEc4+t8NRmPjCVeVOOMio= Received: by 10.236.110.173 with SMTP id u33mr784408yhg.46.1297676674449; Mon, 14 Feb 2011 01:44:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.147.136.15 with HTTP; Mon, 14 Feb 2011 01:43:54 -0800 (PST) In-Reply-To: <4D58F700.6090407@gmail.com> References: <4D58F700.6090407@gmail.com> From: Tommaso Teofili Date: Mon, 14 Feb 2011 10:43:54 +0100 Message-ID: Subject: Re: Analysis Engines for mbox like data To: user@uima.apache.org Content-Type: multipart/alternative; boundary=0023547c8c4336c418049c3ae5d5 X-Virus-Checked: Checked by ClamAV on apache.org --0023547c8c4336c418049c3ae5d5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I agree with Jorn, I think that's the faster way. Tommaso 2011/2/14 J=F6rn Kottmann > On 2/14/11 4:49 AM, Radhouane Aniba wrote: > >> Hello everyone, >> >> Quite unusual request to this list, I am wondering if there is any >> analysis >> engine that allow to mine MBOX like formats such as the famous mailman >> mailing list archives in a way that it allow to structure these kind of >> data >> into messages-replies ? >> >> If anyone have already treated this topic I will be very interested in >> discussing it further. >> > > We have a tika integration, and tika has support for mbox. > Maybe that is good enough to do the extraction. > > J=F6rn > --0023547c8c4336c418049c3ae5d5--