hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bilwa S T (Jira)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node
Date Sat, 09 May 2020 17:21:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103381#comment-17103381

Bilwa S T commented on MAPREDUCE-7169:

Hi [~ahussein]

What we are trying to achieve here is speculative attempt shouldn't be launched on faulty
node. So even if task gets killed there is no point launching it on that node as it will slow.This
is expected behaviour
 * Assuming that a new speculative attempt is created. Following the implementation, the new
attempt X will have blacklisted nodes and skipped racks relevant to the original taskAttempt
 * Assuming taskAttempt Y is killed before attempt X gets assigned.
 * The RMContainerAllocator would still assign a host to attemptX based on the dated blacklists.
 Is this the expected behavior? or it is supposed to clear attemptX' blacklisted nodes?{quote}
Yes i think these two cases should be handled
{quote} * Should that object be synchronized? I believe there are more than one thread reading/writing
to that object. Perhaps changing {{taskAttemptToEventMapping}} to {{concurrentHashMap}} would
be sufficient. What do you think?
{quote}* In {{taskAttemptToEventMapping}}, the data is only removed when the taskAttempt
is assigned. If taskAttempt is killed before being assigned, {{taskAttemptToEventMapping}} would
still have the taskAttempt.
Will update this
{quote} * Racks are going to be black listed too. Not just nodes. I believe that the javadoc
and description in default.xml should emphasize that enabling the flag also avoids the local
rack unless no other rack is available for scheduling.{quote}
Actually when task attempt is killed by default Avataar is VIRGIN. this is defect which needs
to be addressed. If speculative task attempt is killed it is launched as normal task attempt
{quote} * why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has a field called {{avataar}} ?{quote}
How do you get taskattempt details in RMContainerAllocator??
{quote} - That's a design issue. One would expect that RequestEvent's lifetime should not
survive {{handle()}} call. Therefore, the metadata should be consumed by the handlers. In
the patch, {{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. Then
you won't need {{TaskAttemptBlacklistManager}} class.{quote}

> Speculative attempts should not run on the same node
> ----------------------------------------------------
>                 Key: MAPREDUCE-7169
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: yarn
>    Affects Versions: 2.7.2
>            Reporter: Lee chen
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, MAPREDUCE-7169-003.patch,
MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, image-2018-12-03-09-54-07-859.png
>           I found in all versions of yarn, Speculative Execution may set the speculative
task to the node of  original task.What i have read is only it will try to have one more task
attempt. haven't seen any place mentioning not on same node.It is unreasonable.If the node
have some problems lead to tasks execution will be very slow. and then placement the speculative
 task to same node cannot help the  problematic task.
>          In our cluster (version 2.7.2,2700 nodes),this phenomenon appear almost
>  !image-2018-12-03-09-54-07-859.png! 

This message was sent by Atlassian Jira

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message