community-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (COMDEV-161) mailglomper.py may count a message multiple times
Date Sat, 26 Sep 2015 00:27:04 GMT

     [ https://issues.apache.org/jira/browse/COMDEV-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebb resolved COMDEV-161.
-------------------------
    Resolution: Fixed

URL: http://svn.apache.org/viewvc?rev=1705389&view=rev
Log:
COMDEV-161 mailglomper.py may count a message multiple times
Fixed RE to look for "From " at the start of a line
Also changed code to read data by line rather than slurping entire mailbox into memory
Added some timestamp traces to check on performance

Modified:
    comdev/reporter.apache.org/trunk/mailglomper.py


> mailglomper.py may count a message multiple times
> -------------------------------------------------
>
>                 Key: COMDEV-161
>                 URL: https://issues.apache.org/jira/browse/COMDEV-161
>             Project: Community Development
>          Issue Type: Bug
>          Components: Reporter Tool
>            Reporter: Sebb
>
> The mailglomper.py script counts messages by matching /Date: (.*)/.
> It is looking to match header lines of the form:
> Date: Thu, 01 May 2008 05:06:51 +0000
> However such lines are not guaranteed to be unique within a message.
> In particular SVN commit messages have a "Date:" line which matches, and the parsed timestamp
will be much the same as the header date. For example:
> Author: cml
> Date: Wed Sep 16 19:06:03 2015
> New Revision: 1703436
> The mailbox format currently used by the ASF guarantees that each message is prefixed
with a line in the format:
> From user@example.com Thu May 01 05:10:32 2008
> [Lines in the message body starting "From " are prefixed as ">From "; the prefix is
removed when messages are extracted]
> Only lines starting "From " are guaranteed not to occur in message bodies.
> The problem is trivial to fix, but it will change the generated statistics, particularly
for mailboxes that receive SVN commit messages (Git commits use a different prefix for the
timestamp). SVN mails will generally be counted twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message