hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6754) Container Ids for an yarn application should be monotonically increasing in the scope of the application
Date Wed, 24 Aug 2016 16:42:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435248#comment-15435248
] 

Jason Lowe commented on MAPREDUCE-6754:
---------------------------------------

I don't understand the concern about changing the JvmID.  It's not really public and only
used within the scope of a single job (i.e.: not by the JHS or anything like that) so from
my perspective we can completely gut this class to do whatever we want.  Am I missing a use-case
for the JvmID that is going to be problematic?  Adding the AM attempt ID to the JvmID seems
like a straightforward and proper fix.  If we're really concerned about backwards-compatibility
then we can preserve the existing methods and add new constructor methods that can take a
specified attempt ID.  The existing constructors can assume an attempt ID of 1.  There's still
the string representation if someone is trying to parse the JvmID string, but again I don't
know where that would be seen by end-users outside of AM logs (if it's even there).

I'm not a fan of changing YARN's container ID semantics to fix this, as I see this as a MapReduce-specific
problem that has a straightforward solution in MapReduce.

> Container Ids for an yarn application should be monotonically increasing in the scope
of the application
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6754
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Srikanth Sampath
>            Assignee: Srikanth Sampath
>
> Currently across application attempts, container Ids are reused.  The container id is
stored in AppSchedulingInfo and it is reinitialized with every application attempt.  So the
containerId scope is limited to the application attempt.
> In the MR Framework, It is important to note that the containerId is being used as part
of the JvmId.  JvmId has 3 components <jobId, "m/r?", containerId>.  The JvmId is used
in datastructures in TaskAttemptListener and is passed between the AppMaster and the individual
tasks.  For an application attempt, no two tasks have the same JvmId.
> This works well currently, since inflight tasks get killed if the AppMaster goes down.
 However, if we want to enable WorkPreserving nature for the AM, containers (and hence containerIds)
live across application attempts.  If we recycle containerIds across attempts, then two independent
tasks (one inflight from a previous attempt  and another new in a succeeding attempt) can
have the same JvmId and cause havoc.
> This can be solved in two ways:
> *Approach A*: Include attempt id as part of the JvmId. This is a viable solution, however,
there is a change in the format of the JVMid. Changing something that has existed so long
for an optional feature is not persuasive.
> *Approach B*: Keep the container id to be a monotonically increasing id for the life
of an application. So, container ids are not reused across application attempts containers
should be able to outlive an application attempt. This is the preferred approach as it is
clean, logical and is backwards compatible. Nothing changes for existing applications or the
internal workings.  
> *How this is achieved:*
> Currently, we maintain latest containerId only for application attempts and reinitialize
for new attempts.  With this approach, we retrieve the latest containerId from the just-failed
attempt and initialize the new attempt with the latest containerId (instead of 0).   I can
provide the patch if it helps.  It currently exists in MAPREDUCE-6726



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message