infra-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (INFRA-12759) Mangled mails in archives for July 2005
Date Fri, 01 Sep 2017 14:21:00 GMT

    [ https://issues.apache.org/jira/browse/INFRA-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150607#comment-16150607
] 

Sebb commented on INFRA-12759:
------------------------------

It looks to be possible to sort out the mbox files without needing backup copies.
At least in many cases.

The interleaved messages appear to have been written in chunks of 1024 bytes (presumably due
to buffering).
This makes it much easier to find the breaks.

One approach is:
- split the mbox into separate files using the '^From ' markers; most of the files will be
normal single messages
- search for any embedded 'From ' lines that appear in the files; such files will contain
interleaved messages. (say: Xaald)
- split the file into 1024-byte chunks.
- Using visual inspection, determine which chunks belong together
- Note that the last chunk will normally contain parts of both messages; it will have to be
split by inspection
- The 1024b chunks can then be recombined.
- Replace the mangled file (Xaald) with the two (or more) rebuilt messages, eg. Xaald1 Xaald2
- Create the fixed mbox by concatenating all the message files
- it should be the same size as the original

> Mangled mails in archives for July 2005
> ---------------------------------------
>
>                 Key: INFRA-12759
>                 URL: https://issues.apache.org/jira/browse/INFRA-12759
>             Project: Infrastructure
>          Issue Type: Bug
>          Components: Mail Archives
>            Reporter: Sebb
>            Assignee: Sebb
>            Priority: Minor
>
> I happened to come across the following line:
> List-Id: <users.spamassassin.apache.From users-return-30385-apmail-spamassassin-pmc-archive=spamassassin.apache.org@spamassassin.apache.org
Thu Jul 21 16:34:11 2005
> in the spamassassin-private archive for July 2005 [1]
> It looks like the archiver was interrupted whilst storing message 30384.
> There's another oddity: why is the list id users@spamassassin in the private archive?
> [1] https://mail-search.apache.org/members/private-arch/spamassassin-private/200507.mbox



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message