hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gour Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish
Date Wed, 01 Feb 2017 00:44:52 GMT

    [ https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847800#comment-15847800
] 

Gour Saha commented on YARN-6136:
---------------------------------

[~wangda] FYI, Slider today uses the following path -
{code}
/registry/users/{user-id}/services/org-apache-slider/{app-name}/components/{container-id}
{code}

> YARN registry service should avoid scanning whole ZK tree for every container/application
finish
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6136
>                 URL: https://issues.apache.org/jira/browse/YARN-6136
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, resourcemanager
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Critical
>
> In existing registry service implementation, purge operation triggered by container finish
event:
> {code}
>   public void onContainerFinished(ContainerId id) throws IOException {
>     LOG.info("Container {} finished, purging container-level records",
>         id);
>     purgeRecordsAsync("/",
>         id.toString(),
>         PersistencePolicies.CONTAINER);
>   }
> {code} 
> Since this happens on every container finish, so it essentially scans all (or almost)
ZK node from the root. 
> We have a cluster which have hundreds of ZK nodes for service registry, and have 20K+
ZK nodes for other purposes. The existing implementation could generate massive ZK operations
and internal Java objects (RegistryPathStatus) as well. The RM becomes very unstable when
there're batch container finish events because of full GC pause and ZK connection failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message