hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4568) Throw "early" exception when duplicate files or archives are found in distributed cache
Date Tue, 09 Oct 2012 16:42:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472526#comment-13472526
] 

Robert Joseph Evans commented on MAPREDUCE-4568:
------------------------------------------------

I spoke with Virag about this before he filed the JIRA.  The main goal here is to provide
a way for Oozie to be able to maintain a bit more of a semblance of backwards compatibility
even after MAPREDUCE-4549 goes in.  They essentially want to de-dupe the entires in the dist
cache that would cause an error.  We originally decided on having a exception thrown because
it would allow for other errors/checks that may show up in the future to also be added in.
 I don't think there would be a problem with adding in a new API that throws an exception
if that API was also added into the 1.x line as well, but perhaps did not throw anything because
there are not the same limitations.

I realize that adding in new APIs, especially since we already have 3 classes that have these
types of APIs in them, is not ideal, but it is the only way to maintain backwards compatibility
and evolve the API.
                
> Throw "early" exception when duplicate files or archives are found in distributed cache
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4568
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4568
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Arun C Murthy
>
> According to #MAPREDUCE-4549, Hadoop 2.x throws exception if duplicates found in cacheFiles
or cacheArchives. The exception  throws during job submission.
> This JIRA is to throw the exception ==early== when it is first added to the Distributed
Cache through addCacheFile or addFileToClassPath.
> It will help the client to decide whether to fail-fast or continue w/o the duplicated
entries.
> Alternatively, Hadoop could provide a knob where user will choose whether to throw error(
coming behavior) or silently ignore (old behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message