hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14608) LLAP: slow scheduling due to LlapTaskScheduler not removing nodes on kill
Date Fri, 02 Sep 2016 21:15:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459610#comment-15459610
] 

Sergey Shelukhin commented on HIVE-14608:
-----------------------------------------

[~sseth] I can actually see problems because of this. Easy repro - start LLAP (e.g. 7 nodes),
start the session (with AM), flex LLAP down (e.g. to 4), run some query. There can be a large
delay in scheduling and the whole job can slow down a lot because nodes are not removed from
instanceToNodeMap...
{noformat}
2016-09-02 16:51:41,428 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager]
|tezplugins.LlapTaskSchedulerService|: Setting up node: DynamicServiceInstance [alive=true,
host=cn109... with resources=<memory:83968, vCores:16>, shufflePort=15551, servicesAddress=...,
mgmtPort=15004] with available capacity=16, pendingQueueSize=null, memory=83968
...
(of course nothing is actually removed)
2016-09-02 16:52:01,490 [INFO] [StateChangeNotificationHandler] |tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|:
Removed node with identity: f9b37b46-f629-4460-862f-f34183ba0a24
2016-09-02 16:52:01,567 [INFO] [StateChangeNotificationHandler] |tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|:
Removed node with identity: 12399334-c743-4a9b-8224-8c0cbc21dea7
2016-09-02 16:52:01,776 [INFO] [StateChangeNotificationHandler] |tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|:
Removed node with identity: c7b50156-b4f9-4353-89a4-3d1a1ccea604
...
2016-09-02 16:53:39,511 [INFO] [LlapScheduler] |tezplugins.LlapTaskSchedulerService|: Assigned
task TaskInfo{task=attempt_1466700718395_1343_2_07_000000_1, priority=140, startTime=0, containerId=null,
assignedInstance=null, uniqueId=24, localityDelayTimeout=0} to container container_222212222_1343_01_000025
on node=DynamicServiceInstance [alive=true, host=cn109... with resources=<memory:83968,
vCores:16>, shufflePort=15551, servicesAddress=..., mgmtPort=15004]
{noformat}

> LLAP: slow scheduling due to LlapTaskScheduler not removing nodes on kill 
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14608
>                 URL: https://issues.apache.org/jira/browse/HIVE-14608
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> ...and presumably doesn't disable them for scheduling. I haven't looked in detail though,
I just see some harmless killed tasks in queries after I kill some LLAP nodes manually between
queries
> {noformat}
>   public void workerNodeRemoved(ServiceInstance serviceInstance) {
>      // FIXME: disabling this for now
> // instanceToNodeMap.remove(serviceInstance.getWorkerIdentity());
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message