james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Kalnichevski (JIRA)" <mime4j-...@james.apache.org>
Subject [jira] [Updated] (MIME4J-196) Lenient parsing of Mailadresses should be a little more lenient
Date Tue, 19 Apr 2011 20:02:05 GMT

     [ https://issues.apache.org/jira/browse/MIME4J-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Oleg Kalnichevski updated MIME4J-196:
-------------------------------------

    Fix Version/s: 0.8

I am working on a set of low level parsing routes that could be used to assemble more lenient
/ tolerant field parsers, but this issue may have to wait until 0.8

Oleg 

> Lenient parsing of Mailadresses should be a little more lenient
> ---------------------------------------------------------------
>
>                 Key: MIME4J-196
>                 URL: https://issues.apache.org/jira/browse/MIME4J-196
>             Project: JAMES Mime4j
>          Issue Type: Wish
>          Components: parser (core)
>            Reporter: Jens Wilmer
>            Priority: Trivial
>             Fix For: 0.8
>
>
> Parsing a mailaddress as in https://issues.apache.org/jira/browse/MIME4J-31 results in
a ParseException. Parsing a mailaddress starting with a dot (.) results in a ParseException.
> When parsing an addressfield with multiple adresses, the Exception occuring while parsing
a single address is caught and null is returned as the resulting addresslist. (this breaks
tika as it expects an empty list rather than null)
> It would be nice if invalid addresses would be handled more gracefully when in lenient
mode. And it would be nice if at least the correct addresses would be returned while parsing
an addresslist with a corrupted address.
> I am using Mime4J via the Apache Tika project to extract text from emails for indexing
in Lucene. The textstream of tika is directly read by a lucene field and indexing fails if
an exception is thrown by Mime4J. This currently happens every time a headerfield contains
more than 1000 characters due to tika using the unusable mime4j standardconfiguration ( https://issues.apache.org/jira/browse/TIKA-640
), and every time a malformed emailaddress is encountered ( https://issues.apache.org/jira/browse/TIKA-641
). 
> These problems can be taken care of in Tika, but there is no way for Tika to retrieve
the working mailaddresses out of a list, if Mime4j returns only none; maybe this problem could
be addressed in Mime4J.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message