hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-10649) LLAP: AM gets stuck completely if one node is dead
Date Thu, 07 May 2015 21:56:59 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-10649:
------------------------------------
    Description: 
See HIVE-10648.
When AM cannot connect to a node, that appears to cause it to stall; example log, there are
no other interleaving logs even though this is happening in the middle of Map 1 on TPCH q1,
i.e. there are plenty of tasks scheduled.
>From "Assigning" messages I can also see tasks are scheduled to all the nodes before and
after the pause, not just to the problematic node. 
LLAP daemons have corresponding gaps where between two fragments nothing is ran for a long
time on any daemon.
{noformat}
2015-05-07 12:13:46,679 INFO [Dispatcher thread: Central] impl.TaskImpl: task_1429683757595_0784_1_00_000276
Task Transitioned from SCHEDULED to RUNNING due to event T_ATTEMPT_LAUNCHED
2015-05-07 12:13:46,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 10 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:46,955 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:47,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 11 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:48,812 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 12 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:49,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 13 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:50,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 14 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:51,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 15 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:52,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 16 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:53,815 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 17 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:54,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 18 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:55,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 19 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 20 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,971 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:57,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 21 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:58,818 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 22 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:59,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 23 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:00,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 24 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:01,820 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 25 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:02,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 26 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:03,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 27 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:04,822 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 28 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:05,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 29 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:06,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 30 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:06,984 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:14:07,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 31 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:08,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 32 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:09,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 33 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:10,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 34 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:11,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 35 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:12,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 36 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:13,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 37 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:14,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 38 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:15,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 39 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:16,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 40 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:16,996 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:14:17,829 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 41 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:18,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 42 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:19,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 43 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:20,831 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 44 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:21,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 45 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:22,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 46 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:23,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 47 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:24,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 48 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:25,834 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 49 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:25,836 INFO [TaskCommunicator # 3] tezplugins.LlapTaskCommunicator: Unable
to run task: attempt_1429683757595_0784_1_00_000017_0 on containerId: container_222212222_0784_01_000018,
Communication Error
2015-05-07 12:14:25,841 INFO [Dispatcher thread: Central] history.HistoryEventHandler: [HISTORY][DAG:dag_1429683757595_0784_1][Event:TASK_ATTEMPT_FINISHED]:
vertexName=Map 1, taskAttemptId=attempt_1429683757595_0784_1_00_000017_0, startTime=1431026014322,
finishTime=1431026065838, timeTaken=51516, status=KILLED, errorEnum=COMMUNICATION_ERROR, diagnostics=Communication
Error, counters=Counters: 1, org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1
{noformat}

  was:
See HIVE-10648.
When AM cannot connect to a node, that appears to cause it to stall.
{noformat}
2015-05-07 12:13:46,679 INFO [Dispatcher thread: Central] impl.TaskImpl: task_1429683757595_0784_1_00_000276
Task Transitioned from SCHEDULED to RUNNING due to event T_ATTEMPT_LAUNCHED
2015-05-07 12:13:46,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 10 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:46,955 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:47,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 11 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:48,812 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 12 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:49,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 13 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:50,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 14 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:51,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 15 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:52,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 16 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:53,815 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 17 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:54,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 18 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:55,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 19 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 20 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,971 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:57,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 21 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:58,818 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 22 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:59,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 23 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:00,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 24 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:01,820 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 25 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:02,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 26 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:03,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 27 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:04,822 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 28 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:05,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 29 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:06,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 30 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:06,984 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:14:07,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 31 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:08,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 32 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:09,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 33 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:10,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 34 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:11,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 35 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:12,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 36 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:13,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 37 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:14,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 38 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:15,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 39 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:16,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 40 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:16,996 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
2015-05-07 12:14:17,829 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 41 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:18,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 42 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:19,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 43 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:20,831 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 44 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:21,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 45 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:22,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 46 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:23,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 47 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:24,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 48 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:25,834 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 49 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:25,836 INFO [TaskCommunicator # 3] tezplugins.LlapTaskCommunicator: Unable
to run task: attempt_1429683757595_0784_1_00_000017_0 on containerId: container_222212222_0784_01_000018,
Communication Error
2015-05-07 12:14:25,841 INFO [Dispatcher thread: Central] history.HistoryEventHandler: [HISTORY][DAG:dag_1429683757595_0784_1][Event:TASK_ATTEMPT_FINISHED]:
vertexName=Map 1, taskAttemptId=attempt_1429683757595_0784_1_00_000017_0, startTime=1431026014322,
finishTime=1431026065838, timeTaken=51516, status=KILLED, errorEnum=COMMUNICATION_ERROR, diagnostics=Communication
Error, counters=Counters: 1, org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1
{noformat}


> LLAP: AM gets stuck completely if one node is dead
> --------------------------------------------------
>
>                 Key: HIVE-10649
>                 URL: https://issues.apache.org/jira/browse/HIVE-10649
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> See HIVE-10648.
> When AM cannot connect to a node, that appears to cause it to stall; example log, there
are no other interleaving logs even though this is happening in the middle of Map 1 on TPCH
q1, i.e. there are plenty of tasks scheduled.
> From "Assigning" messages I can also see tasks are scheduled to all the nodes before
and after the pause, not just to the problematic node. 
> LLAP daemons have corresponding gaps where between two fragments nothing is ran for a
long time on any daemon.
> {noformat}
> 2015-05-07 12:13:46,679 INFO [Dispatcher thread: Central] impl.TaskImpl: task_1429683757595_0784_1_00_000276
Task Transitioned from SCHEDULED to RUNNING due to event T_ATTEMPT_LAUNCHED
> 2015-05-07 12:13:46,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 10 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:46,955 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
> 2015-05-07 12:13:47,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 11 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:48,812 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 12 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:49,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 13 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:50,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 14 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:51,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 15 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:52,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 16 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:53,815 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 17 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:54,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 18 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:55,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 19 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:56,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 20 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:56,971 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
> 2015-05-07 12:13:57,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 21 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:58,818 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 22 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:13:59,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 23 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:00,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 24 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:01,820 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 25 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:02,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 26 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:03,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 27 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:04,822 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 28 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:05,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 29 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:06,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 30 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:06,984 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
> 2015-05-07 12:14:07,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 31 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:08,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 32 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:09,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 33 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:10,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 34 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:11,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 35 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:12,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 36 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:13,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 37 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:14,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 38 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:15,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 39 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:16,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 40 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:16,996 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting
to refresh ServiceInstanceSet 1611673583
> 2015-05-07 12:14:17,829 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 41 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:18,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 42 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:19,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 43 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:20,831 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 44 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:21,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 45 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:22,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 46 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:23,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 47 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:24,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 48 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:25,834 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server:
cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 49 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
> 2015-05-07 12:14:25,836 INFO [TaskCommunicator # 3] tezplugins.LlapTaskCommunicator:
Unable to run task: attempt_1429683757595_0784_1_00_000017_0 on containerId: container_222212222_0784_01_000018,
Communication Error
> 2015-05-07 12:14:25,841 INFO [Dispatcher thread: Central] history.HistoryEventHandler:
[HISTORY][DAG:dag_1429683757595_0784_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=attempt_1429683757595_0784_1_00_000017_0,
startTime=1431026014322, finishTime=1431026065838, timeTaken=51516, status=KILLED, errorEnum=COMMUNICATION_ERROR,
diagnostics=Communication Error, counters=Counters: 1, org.apache.tez.common.counters.DAGCounter,
DATA_LOCAL_TASKS=1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message