community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Baldassari <castalia.laborat...@gmail.com>
Subject Re: Standards for mail archive statistics gathering?
Date Tue, 05 May 2015 11:33:00 GMT
Hi Folks,

Sorry for the late answer on this thread. Don't know what has been done 
since then, but I've some experience to share on this, so here are my 2c..

* Parsing dates and time zones:
If you are to use Perl, the Date::Parse module handles dates and time 
zones pretty well. As for Python I don't know -- there probably is a 
module for that too..
I used Date::Parse to parse ASF mboxes (notably for Ant and JMeter, the 
data sets have been published here [0]), and it worked great. I do have 
a Perl script to do that, which I can provide -- but I have no access 
I'm aware of in the dev scm, and not sure if Perl is the most common 
language here.. so please let me know.

* Parsing mboxes for software repository data mining:
There is a suite of tools exactly targeted at this kind of duty on 
github: Metrics Grimoire [1], developed (and used) by Bitergia [2]. I 
don't know how they manage time zones, but the toolsuite is widely used 
around (see [3] or [4] as examples) so I believe they are quite robust. 
It includes tools for data retrieval as well as visualisation.

* As for the feedback/thoughts about the architecture and formats:
I love the REST-API idea proposed by Rob. That's really easy to access 
and retrieve through scripts on-demand. CSV and JSON are my favourite 
formats, because they are, again, easy to parse and widely used -- every 
language and library has some facility to read them natively.


Cheers,


[0] http://castalia.solutions/datasets/
[1] https://metricsgrimoire.github.io/
[2] http://bitergia.com
[3] Eclipse Dashboard: http://dashboard.eclipse.org/
[4] OpenStack Dashboard: http://activity.openstack.org/dash/browser/



--
Boris Baldassari
Castalia Solutions -- Elegant Software Engineering
Web: http://castalia.solutions
Phone: +33 6 48 03 82 89


Le 28/04/2015 16:11, Rich Bowen a écrit :
>
>
> On 04/27/2015 09:36 AM, Shane Curcuru wrote:
>> I'm interested in working on some visualizations of mailing list
>> activity over time, in particular some simple analyses, like thread
>> length/participants and the like.  Given that the raw data can all be
>> precomputed from mbox archives, is there any semi-standard way to
>> distill and save metadata about mboxes?
>>
>> If we had a generic static database of past mail metadata and statistics
>> (i.e. not details of contents, but perhaps overall # of lines of text or
>> something), it would be interesting to see what kinds of visualizations
>> that different people would come up with.
>>
>> Anyone have pointers to either a data format or the best parsing library
>> for this?  I'm trying to think ahead, and work on the parsing, storing
>> statistics, and visualizations as separate pieces so it's easier for
>> different people to collaborate on something.
>
> Roberto posted something to the list a month or so ago about the 
> efforts that he's been working on for this kind of thing. You might 
> ping him.
>
> --Rich
>
>


Mime
View raw message