hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-1593) support out-of-proc AuxiliaryServices
Date Mon, 13 Jan 2014 23:47:50 GMT
Ming Ma created YARN-1593:

             Summary: support out-of-proc AuxiliaryServices
                 Key: YARN-1593
                 URL: https://issues.apache.org/jira/browse/YARN-1593
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Ming Ma

AuxiliaryServices such as ShuffleHandler currently run in the same process as NM. There are
some benefits to host them in dedicated processes.

1. NM rolling restart. If we want to upgrade YARN , NM restart will force the ShuffleHandler
restart. If ShuffleHandler runs as a separate process, ShuffleHandler can continue to run
during NM restart. NM can reconnect the the running ShuffleHandler after restart.

2. Resource management. It is possible another type of AuxiliaryServices will be implemented.
AuxiliaryServices are considered YARN application specific and could consume lots of resources.
Running AuxiliaryServices in separate processes allow easier resource management. NM could
potentially stop a specific AuxiliaryServices process from running if it consumes resource
way above its allocation.

Here are some high level ideas:

1. NM provides a hosting process for each AuxiliaryService. Existing AuxiliaryService API
doesn't change.

2. The hosting process provides RPC server for AuxiliaryService proxy object inside NM to
connect to.

3. When we rolling restart NM, the existing AuxiliaryService processes will continue to run.
NM could reconnect to the running AuxiliaryService processes upon restart.

4. Policy and resource management of AuxiliaryServices. So far we don't have immediate need
for this. AuxiliaryService could run inside a container and its resource utilization could
be taken into account by RM and RM could consider a specific type of applications overutilize
cluster resource.

This message was sent by Atlassian JIRA

View raw message