hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Welch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
Date Wed, 06 May 2015 23:08:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531635#comment-14531635

Craig Welch commented on YARN-1680:


bq. Actually I think this statement may not true, assume we compute an accurate headroom for
app, but that doesn't mean the app can get as much resource as we compute...you may not be
able to get it after hours.

This would only occur if other applications were allocated those resources, in which case
the headroom will drop and the application will be made aware of it via headroom updates.
 The scenario you propose as a counter example is inaccurate.  It is the case that accurate
headroom (including a fix for the blacklist issue here) will result in faster overall job
completion than the reactionary approach with allocation failure.


bq. OTOH, blacklisting / hard-locality are app-decisions. From the platform's perspective,
those nodes, free or otherwise, are actually available for apps to use

Not quite so, as the scheduler respects the blacklist and doesn't allocate containers to an
app when it would run counter to the apps blacklisting

That said, so far the discussion regarding the proposal has largely been about where the activity
should live, let's put that aside for a moment and concentrate on the approach itself.  With
api additions / additional library work / etc it should be possible to do the same thing outside
the scheduler as within.  Whether and what to do in or out of the scheduler needs to be settled
still, of course, but a decision on how the headroom will be adjusted is needed in any case,
and and is needed before putting together the change wherever it ends up living.


"where app headroom is finalized" == in the scheduler OR in a library available/used by AM's.
 if externalized, obviously api's to get whatever info is not yet available outside the scheduler
will need to be added

Retain a node/rack blacklist where app headroom is finalized (already the case)
Add a "last change" timestamp or incrementing counter to track node addition/removal at the
cluster level (which is what exists for "cluster black/white" listing afaict), updated when
those events occur
Add a "last change" timestamp/counter to where app headroom is finalized to track blacklist
have "last updated" values on where app headroom is finalized to track the above two "last
change" values, updated when blacklist values are recalculated
On headroom calculation, where app headroom is finalized checks if it has any entries in the
blacklist or if it has a "blacklist deduction" value in it's resourceusage entry (see below),
to determine if blacklist must be taken into account
if blacklist must be taken into account, check the "last updated" values for both cluster
and app blacklist changes, if and only if either is stale (last updated != last change) then
recalculate the blacklist deduction
when calculating the blacklist deduction use Chen He basic logic from existing patches. Place
the deduction value into where app headroom is finalized. NodeLables could be taken into account
as well, only blacklist entries which match the nodelabel expression used by the application
would be added to the deduction, if a nl expression is in play
whenever the headroom is generated where app headroom is finalized, perform the blacklist
value deduction

> availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes
free memory.
> ------------------------------------------------------------------------------------------------------
>                 Key: YARN-1680
>                 URL: https://issues.apache.org/jira/browse/YARN-1680
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith
>            Assignee: Craig Welch
>         Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start
is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become
unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer
task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption calculation,
headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager
does not assing any new containers on blacklisted nodes but returns availableResouce considers
cluster free memory). 

This message was sent by Atlassian JIRA

View raw message