james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <mime4j-...@james.apache.org>
Subject [jira] [Updated] (MIME4J-249) Date parser could be more robust
Date Wed, 18 May 2016 16:08:12 GMT

     [ https://issues.apache.org/jira/browse/MIME4J-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Allison updated MIME4J-249:
-------------------------------
    External issue URL:   (was: https://issues.apache.org/jira/browse/TIKA-1970)

> Date parser could be more robust
> --------------------------------
>
>                 Key: MIME4J-249
>                 URL: https://issues.apache.org/jira/browse/MIME4J-249
>             Project: James Mime4j
>          Issue Type: Improvement
>          Components: parser (core)
>    Affects Versions: 0.7.2
>            Reporter: Tim Allison
>
> On TIKA-1970, [~philipp.steinkrueger@uni-koeln.de] submitted an email that he generated
in Mac Mail by "saving as text."  The file is available [here|https://issues.apache.org/jira/secure/attachment/12804129/Testemail-nodate.txt].
 The date is of format {{16 May 2016 at 09:30:32 GMT+1}}, and we're getting a {{null}} when
we use the LenientFieldParser to parse the date field.
> After fixing that, I ran our Mime4j wrapper on ~19k rfc822 files from our regression
testing corpus.  I found that mime4j didn't have a parse for ~3700 files (~20%).  I added
some substantial workarounds in Tika, and would be happy to contribute test files/code.
> I also found that Mime4j was misparsing dates of format {{14 Dec 95 00:16:22 GMT}} as
the year 95 A.D.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message