Return-Path: X-Original-To: apmail-community-dev-archive@minotaur.apache.org Delivered-To: apmail-community-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C0DE10B54 for ; Tue, 5 May 2015 23:24:15 +0000 (UTC) Received: (qmail 47594 invoked by uid 500); 5 May 2015 23:24:15 -0000 Delivered-To: apmail-community-dev-archive@community.apache.org Received: (qmail 47313 invoked by uid 500); 5 May 2015 23:24:15 -0000 Mailing-List: contact dev-help@community.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@community.apache.org Delivered-To: mailing list dev@community.apache.org Received: (qmail 47300 invoked by uid 99); 5 May 2015 23:24:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 23:24:15 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for dev@community.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 23:24:10 +0000 Received: from mail-ie0-f177.google.com (mail-ie0-f177.google.com [209.85.223.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0B3B443E3B for ; Tue, 5 May 2015 23:23:50 +0000 (UTC) Received: by iecrt8 with SMTP id rt8so1952601iec.0 for ; Tue, 05 May 2015 16:23:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:message-id :references:to; bh=1AH1WctI0EKfnqOSpZOGXkqj7y5IiIfFtJBd/fMkyDk=; b=UlvvtHXolaEV9N5/YSNteDCtu2fKHE4KR9bNfIC46HQvvwyeqV0SfLX6ia6daYTJ7L Ga8yrmoqtMFvZ0WhFNW4zXx4yzJxrfZG4ftkWV4oL0R9KKbsqzj+vy9ED00VBiI8XwaD ofWP1fZZATKXJhn/zw/Rhlrl2vMP0XaVZgZ871t4q4PX0CB9H+ovAPyFEh2Rdp75LWGD HiDsowtkWDaeELhfnGQpcdaSZteTCta1pKbqVz/H2478iocN5qNtfso01FgHBSg2vfnj 3zDZCtPF5gkoPgkV2+38OYIGsf/qWDOagfYU1yzl9HVaLOStp2qchrThDkaAiRfdAcws 2pvA== X-Received: by 10.43.6.65 with SMTP id oj1mr2244117icb.75.1430868182621; Tue, 05 May 2015 16:23:02 -0700 (PDT) Received: from [10.0.1.23] (CPEbc4dfb554291-CMbc4dfb554290.cpe.net.cable.rogers.com. [174.116.242.87]) by mx.google.com with ESMTPSA id m9sm577698igv.4.2015.05.05.16.23.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 May 2015 16:23:01 -0700 (PDT) Content-Type: multipart/signed; boundary="Apple-Mail=_7E2EDF18-F708-4F54-83EB-BBE7D1890CBF"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: Standards for mail archive statistics gathering? X-Pgp-Agent: GPGMail 2.5b6 From: =?utf-8?Q?Louis_Su=C3=A1rez-Potts?= In-Reply-To: <5548AA6C.2090907@gmail.com> Date: Tue, 5 May 2015 19:22:59 -0400 Message-Id: <371B0BDB-22B5-4724-93C2-302AABD79786@gmail.com> References: <553E3B78.1080208@shanecurcuru.org> <553F952C.9@rcbowen.com> <5548AA6C.2090907@gmail.com> To: dev@community.apache.org X-Mailer: Apple Mail (2.2098) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_7E2EDF18-F708-4F54-83EB-BBE7D1890CBF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 05 May 2015, at 07:33, Boris Baldassari = wrote: >=20 > Hi Folks, >=20 > Sorry for the late answer on this thread. Don't know what has been = done since then, but I've some experience to share on this, so here are = my 2c.. >=20 > * Parsing dates and time zones: > If you are to use Perl, the Date::Parse module handles dates and time = zones pretty well. As for Python I don't know -- there probably is a = module for that too.. > I used Date::Parse to parse ASF mboxes (notably for Ant and JMeter, = the data sets have been published here [0]), and it worked great. I do = have a Perl script to do that, which I can provide -- but I have no = access I'm aware of in the dev scm, and not sure if Perl is the most = common language here.. so please let me know. >=20 > * Parsing mboxes for software repository data mining: > There is a suite of tools exactly targeted at this kind of duty on = github: Metrics Grimoire [1], developed (and used) by Bitergia [2]. I = don't know how they manage time zones, but the toolsuite is widely used = around (see [3] or [4] as examples) so I believe they are quite robust. = It includes tools for data retrieval as well as visualisation. >=20 > * As for the feedback/thoughts about the architecture and formats: > I love the REST-API idea proposed by Rob. That's really easy to access = and retrieve through scripts on-demand. CSV and JSON are my favourite = formats, because they are, again, easy to parse and widely used -- every = language and library has some facility to read them natively. I have to endorse Bitergia, too. If they don=E2=80=99t immediately have = what is wanted, they are likely to be interested in working on it. But = you know this, I=E2=80=99m guessing. louis >=20 >=20 > Cheers, >=20 >=20 > [0] http://castalia.solutions/datasets/ > [1] https://metricsgrimoire.github.io/ > [2] http://bitergia.com > [3] Eclipse Dashboard: http://dashboard.eclipse.org/ > [4] OpenStack Dashboard: http://activity.openstack.org/dash/browser/ >=20 >=20 >=20 > -- > Boris Baldassari > Castalia Solutions -- Elegant Software Engineering > Web: http://castalia.solutions > Phone: +33 6 48 03 82 89 >=20 >=20 > Le 28/04/2015 16:11, Rich Bowen a =C3=A9crit : >>=20 >>=20 >> On 04/27/2015 09:36 AM, Shane Curcuru wrote: >>> I'm interested in working on some visualizations of mailing list >>> activity over time, in particular some simple analyses, like thread >>> length/participants and the like. Given that the raw data can all = be >>> precomputed from mbox archives, is there any semi-standard way to >>> distill and save metadata about mboxes? >>>=20 >>> If we had a generic static database of past mail metadata and = statistics >>> (i.e. not details of contents, but perhaps overall # of lines of = text or >>> something), it would be interesting to see what kinds of = visualizations >>> that different people would come up with. >>>=20 >>> Anyone have pointers to either a data format or the best parsing = library >>> for this? I'm trying to think ahead, and work on the parsing, = storing >>> statistics, and visualizations as separate pieces so it's easier for >>> different people to collaborate on something. >>=20 >> Roberto posted something to the list a month or so ago about the = efforts that he's been working on for this kind of thing. You might ping = him. >>=20 >> --Rich >>=20 >>=20 >=20 --Apple-Mail=_7E2EDF18-F708-4F54-83EB-BBE7D1890CBF Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.27 Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJVSVDUAAoJEMbyrHkPMeDIChIIAIEacpshf6CoqBB4yvFB9R4W O/YrvpQVtWA2pnRJZ02ylMtpnKSSH15rSsB/hbsrQ5g4SCP7aQi4mCrFm88q8PCc P+5EZ9Yy7Yb0uPJEikFpG8dtBn67tbDqQA+qb5oxAc6oNqrhzX6kGsKXchxWZt2w /1zDTNU2gxsUWZoEZEluQXqFDAMhbLYIFvEV8o+0ooaywNW5G+IJoicXf3FSYK91 HrK3Gzzr/8z4XFWrBuVK1Ryu4Cks1CeqDfzy1ahJ7djJVADoCM3Jk8Fj60ZX5D2e QSwtFhdKbYqsjXP9viMzLL0JT/XIzD8zXjGZXpGQ9yNMAhmna/PhjOyj8PyZA00= =sesU -----END PGP SIGNATURE----- --Apple-Mail=_7E2EDF18-F708-4F54-83EB-BBE7D1890CBF--