hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Hussein (Jira)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node
Date Thu, 07 May 2020 15:44:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101802#comment-17101802
] 

Ahmed Hussein commented on MAPREDUCE-7169:
------------------------------------------

[~BilwaST], the speculation, taskAttempts, and allocations code set is not a straightforward
module to tackle. You did a great good job!

I have the following points:

*Corner Case scenario:*

* Assuming that a new speculative attempt is created. Following the implementation, the new
attempt X will have blacklisted nodes and skipped racks relevant to the original taskAttempt
Y
* Assuming taskAttempt Y is killed before attempt X gets assigned.
* The RMContainerAllocator would still assign a host to attemptX based on the dated blacklists.

Is this the expected behavior? or it is supposed to clear  attemptX' blacklisted nodes?

*TaskAttemptBlacklistManager*

*  Should that object be synchronized? I believe there are more than one thread reading/writing
to that object. Perhaps changing {{taskAttemptToEventMapping}} to {{concurrentHashMap}} would
be sufficient. What do you think?
*   In {{taskAttemptToEventMapping}}, the data is only removed when the taskAttempt is assigned.
If taskAttempt is killed before being assigned, {{taskAttemptToEventMapping}} would still
have the taskAttempt.

*{{TaskAttemptBlacklistManager}}*

* Should that object be synchronized? I believe there are more than one thread reading/writing
to that object. Perhaps changing {{taskAttemptToEventMapping}} to concurrentHashMap would
be sufficient. What do you think?
* In taskAttemptToEventMapping, the data is only removed when the taskAttempt is assigned.
If taskAttempt is killed before being assigned, taskAttemptToEventMapping would still have
the taskAttempt.

*{{TaskAttemptImpl}}*
* Racks are going to be black listed too. Not just nodes. I believe that the javadoc and description
in default.xml should emphasize that enabling the flag also avoids the local rack unless no
other rack is available for scheduling.

*{{TaskImpl}}*
* why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has a field called {{avataar}}
?

*{{ContainerRequestEvent}}*
  - That's a design issue. One would expect  that RequestEvent's lifetime should not survive
{{handle()}} call. Therefore, the metadata should be consumed by the handlers. In the patch,
{{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. Then you won't
need  {{TaskAttemptBlacklistManager}} class.


> Speculative attempts should not run on the same node
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-7169
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: yarn
>    Affects Versions: 2.7.2
>            Reporter: Lee chen
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, MAPREDUCE-7169-003.patch,
MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, image-2018-12-03-09-54-07-859.png
>
>
>           I found in all versions of yarn, Speculative Execution may set the speculative
task to the node of  original task.What i have read is only it will try to have one more task
attempt. haven't seen any place mentioning not on same node.It is unreasonable.If the node
have some problems lead to tasks execution will be very slow. and then placement the speculative
 task to same node cannot help the  problematic task.
>          In our cluster (version 2.7.2,2700 nodes),this phenomenon appear almost
everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message