hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
Date Sat, 10 May 2014 22:04:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994140#comment-13994140
] 

Ming Ma commented on MAPREDUCE-5465:
------------------------------------

Thanks, Jason! We have discussed the performance implication in https://issues.apache.org/jira/browse/YARN-221.
It is good to revisit the issue.

1. I assume job latency is the metric we want to use. The question is how much such change
impacts the job latency.

2. Say umbilical notification is at t1, task receives T_ATTEMPT_SUCCEEDED or T_ATTEMPT_FAILED
at t2, MRAppMaster acquires new containers from RM for next set of tasks at t3.

3. How much does (t2-t1) impact job latency? It depends on the job characteristics. mapper
output can be available sooner; reducer containers can be scheduled sooner, etc. But it isn't
going to be linear to number of tasks; given tasks run in parallel. So it should be much smaller.
I don't have the formula. It will be useful to compare the performance difference using actual
jobs.

4. Your suggestion of notifying task/job right after t1 is a good idea to improve (t2-t1).
I assume it doesn't change the state transition of task attempt. We need to confirm state
machine correctness point of view, given there might be some assumptions between task attempt
and task state machines.

5. (t3-t1) can also impact job latency. Notifying task/job earlier won't help to improve (t3-t1).

6. To improve (t3-t1), perhaps when container exits, it should send OutofBandHeartBeat. Currently
OutofBandHeartBeat is sent only when stopContainer is called. Perhaps This is useful when
NM->RM's heartbeat interval is big.

7. It appears there is some issue w.r.t. the current stopContainer's calling NodeStatusUpdaterImpl's
OutofBandHeartBeat processing. stopContainer first enqueues "kill" container event before
calling NodeStatusUpdaterImpl's OutofBandHeartBeat. So it is possible the NodeStatusUpdaterImpl
heartbeat thread sends the heartbeat to RM before the main Dispatcher thread processes the
event and mark the container as completed. Thus the OutofBandHeartBeat doesn't include that
container in the completed container list. Does it really need to call NodeStatusUpdaterImpl's
OutofBandHeartBeat in stopContainer? It seems it is better to call it only when a container
exits.

> Container killed before hprof dumps profile.out
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5465
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: trunk, 2.0.3-alpha
>            Reporter: Radim Kolar
>            Assignee: Ming Ma
>         Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, MAPREDUCE-5465-4.patch,
MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps profile.out at process
exit. It is dumped after task signaled to AM that work is finished.
> AM kills container with finished work without waiting for hprof to finish dumps. If hprof
is dumping larger outputs (such as with depth=4 while depth=3 works) , it could not finish
dump in time before being killed making entire dump unusable because cpu and heap stats are
missing.
> There needs to be better delay before container is killed if profiling is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message