hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantinos Karanasos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
Date Fri, 11 Nov 2016 03:18:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655975#comment-15655975
] 

Konstantinos Karanasos commented on YARN-4597:
----------------------------------------------

Thanks for working on this, [~asuresh]! I am sending some first comments. I have not yet looked
at the {{ContainerScheduler}} -- I will do that tomorrow.

- The {{Container}} has two new methods ({{sendLaunchEvent}} and {{sendKillEvent}}), which
are public and are not following the design of the rest of the code that keeps such methods
private and calls them through transitions in the {{ContainerImpl}}. Let's try to use the
existing design if possible.

- In {{RMNodeImpl}}:
-- Instead of using the {{launchedContainers}} for both the launched and the queued, we might
want to split it in two: one for the launched and one for the queued containers.
-- I think we should not add opportunistic containers to the {{launchContainers}}. If we do,
they will be added to the {{newlyLaunchedContainers}}, then to the {{nodeUpdateQueue}}, and,
if I am not wrong, they will be propagated to the schedulers for the guaranteed containers,
which will create problems. I have to look at it a bit more, but my hunch is that we should
avoid doing it. Even if it does not affect the resource accounting, I don't see any advantage
to adding them.

- In the {{OpportunisticContainerAllocatorAMService}} we are now calling the {{SchedulerNode::allocate}},
and then we do not update the used resources, but we do update some other counters, which
leads to inconsistencies. For example, when releasing a container, I think at the moment we
are not calling the release of the {{SchedulerNode}}, which means that the container count
will become inconsistent.
-- Instead, I suggest to add some counters for opportunistic containers at the {{SchedulerNode}},
both for the number of containers and for the resources used. In this case, we need to make
sure that those resources are released too.

- Maybe as part of a different JIRA, we should at some point extend the {{container.metrics}}
in the {{ContainerImpl}} to keep track of the scheduled/queued containers.

h6. Nits:
- There seem to be two redundant parameters at {{YarnConfiguration}} at the moment: {{NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH}}
and {{NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH}}. If I am not missing something, we should
keep one of the two.
- {{yarn-default.xml}}: numbed -> number (in a comment)
- {{TestNodeManagerResync}}: I think it is better to use one of the existing methods for waiting
to get to the RUNNING state.
- In {{Container}}/{{ContainerImpl}} and all the associated classes, I would suggest to rename
{{isMarkedToKill}} to {{isMarkedForKilling}}. I know it is minor, but it is more self-explanatory.

I will send more comments once I check the {{ContainerScheduler}}. 
Also, let's stress-test the code in a cluster before committing to make sure everything is
good. I can help with that.


> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>              Labels: oct16-hard
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, YARN-4597.003.patch, YARN-4597.004.patch,
YARN-4597.005.patch, YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, YARN-4597.009.patch
>
>
> Currently, the NM immediately launches containers after resource localization. Several
features could be more cleanly implemented if the NM included a separate stage for reserving
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message