community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Created] (COMDEV-292) Mailglomper does not handle renamed lists well
Date Sun, 29 Jul 2018 13:48:00 GMT
Sebb created COMDEV-292:
---------------------------

             Summary: Mailglomper does not handle renamed lists well
                 Key: COMDEV-292
                 URL: https://issues.apache.org/jira/browse/COMDEV-292
             Project: Community Development
          Issue Type: Bug
          Components: Reporter Tool
            Reporter: Sebb


The mailglomper script does not take account of renamed mailing lists.
This can result in double counting the activity for a project.

For example, commits@libcloud was renamed to notifications@libcloud in March 2014.

However the data in the maildata_extended.json file includes weekly epoch entries
for commits:
1507161600 2017-10-05 00:00:00 UTC
to
1524096000 2018-04-19 00:00:00 UTC

whereas notifications has:
1515024000 2018-01-04 00:00:00 UTC
to
1531958400 2018-07-19 00:00:00 UTC

The weekly counts agree for the overlap period.

If the commits mbox files were still present up to April 2018, there would be an index entry
for the list, and if there was also a redirect in place, the code would see the redirected
files.

I think the code should probably ignore redirects if that's possible.

When a list is renamed, the old data ought to be dropped, otherwise it may be double-counted.
Also the obsolete entries will gradually accumulate.
This applies to both the maildata_weekly.json and maildata_extended.json files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Mime
View raw message