asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Requesting node X to shutdown to ensure failure?
Date Sun, 30 Sep 2018 17:52:06 GMT
Hi devs,

I saw the following log messages in the cc.log of the Cloudberrry cluster.
Yesterday around 9:50 pm, a task cluster was created and executed. And
around 9:51 pm, the cluster shows the following message. In the nc.log of
the node 2, there were no noticeable log messages. I wonder what could
cause this situation.

-- cc.log at 9:51 pm
*21:51:54.203 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2 to
shutdown to ensure failure*

*-- nc-2.log around 9:50 pm*
21:50:30.314 [Worker:2] INFO
 org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:2
21:50:30.375 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:3:0
21:50:30.382 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:3:0
21:50:30.390 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:2:0
21:50:30.391 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:2:0
21:50:30.392 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0
21:50:30.392 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0
21:50:30.398 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0
21:50:30.398 [Worker:2] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0
22:02:30.290 [JVM exit thread] INFO  org.apache.hyracks.util.ExitUtil - JVM
exiting with status 0; bye!


-- *cc.log from 9:51 pm*
21:50:30.313 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1
Task Clusters
21:50:30.313 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks:
[TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1,
TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4,
TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7,
TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0,
TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3,
TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6,
TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]
21:50:30.313 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots:
[TC:[TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1,
TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4,
TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7,
TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0,
TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3,
TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6,
TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]],
inProgressTaskClusters: []
21:50:30.366 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:5:0]
21:50:30.374 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:9:0]
21:50:30.376 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:3:0]
21:50:30.376 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:8:0]
21:50:30.377 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:0:0]
21:50:30.378 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:4:0]
21:50:30.381 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:1:0]
21:50:30.382 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:8:0]
21:50:30.383 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:3:0]
21:50:30.384 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:4:0]
21:50:30.386 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:5:0]
21:50:30.386 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:9:0]
21:50:30.389 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:6:0]
21:50:30.389 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:0:0]
21:50:30.392 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:2:0]
21:50:30.392 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:7:0]
21:50:30.392 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:6:0]
21:50:30.394 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:1:0]
21:50:30.395 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:2:0]
21:50:30.395 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:7:0]
21:50:30.396 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:1:0]
21:50:30.396 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:0:0]
21:50:30.396 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:0:0]
21:50:30.396 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:1:0]
21:50:30.397 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:8:0]
21:50:30.398 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:4:0]
21:50:30.398 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:5:0]
21:50:30.398 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:4:0]
21:50:30.398 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:5:0]
21:50:30.399 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:6:0]
21:50:30.399 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:3:0]
21:50:30.399 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:2:0]
21:50:30.399 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:9:0]
21:50:30.399 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:8:0]
21:50:30.399 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:9:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:7:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:6:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:7:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0]
21:50:30.402 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0]
21:50:30.404 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:3:0]
21:50:30.404 [Worker:ClusterController] INFO
 org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:2:0]
*21:51:54.203 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2 to
shutdown to ensure failure*
21:51:54.206 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Request to shutdown
failed node 2 succeeded. false positive heartbeat miss indication
21:51:54.206 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - 2 considered dead.
Last heartbeat received 83799ms ago. Max miss period: 80000ms
21:51:54.206 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 5 to
shutdown to ensure failure
21:51:54.206 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Request to shutdown
failed node 5 succeeded. false positive heartbeat miss indication
21:51:54.206 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - 5 considered dead.
Last heartbeat received 83806ms ago. Max miss period: 80000ms
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 3 to
shutdown to ensure failure
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Request to shutdown
failed node 3 succeeded. false positive heartbeat miss indication
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - 3 considered dead.
Last heartbeat received 83804ms ago. Max miss period: 80000ms
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 4 to
shutdown to ensure failure
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - Request to shutdown
failed node 4 succeeded. false positive heartbeat miss indication
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.cluster.NodeManager - 4 considered dead.
Last heartbeat received 83804ms ago. Max miss period: 80000ms
21:51:54.207 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of
affected jobs: 3
21:51:54.216 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job:
JID:0.1: {1=[TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0,
TAID:TID:ANID:ODID:3:0:1:0]}
21:51:54.216 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - Aborting:
[TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0,
TAID:TID:ANID:ODID:3:0:1:0] at 1
21:51:54.217 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing
uncommitted partitions: []
21:51:54.217 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing
partition requests: []
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - aborting doomed task
clusters
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - number of doomed task
clusters found = 0
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots:
[TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:3:0:0, TID:ANID:ODID:3:0:1,
TID:ANID:ODID:3:0:2, TID:ANID:ODID:3:0:3, TID:ANID:ODID:3:0:4,
TID:ANID:ODID:3:0:5, TID:ANID:ODID:3:0:6, TID:ANID:ODID:3:0:7,
TID:ANID:ODID:3:0:8, TID:ANID:ODID:3:0:9]], inProgressTaskClusters: []
21:51:54.218 [Worker:ClusterController] ERROR
org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected failure.
Aborting job JID:0.1
org.apache.hyracks.api.exceptions.HyracksException: Node 2 not live
        at
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
        at
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
        at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
        at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
        at
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:731)
[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
        at
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
        at
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
[hyracks-control-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job:
JID:0.700: {}
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing
uncommitted partitions: []
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing
partition requests: []
21:51:54.218 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - aborting doomed task
clusters
21:51:54.219 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - number of doomed task
clusters found = 0
21:51:54.219 [Worker:ClusterController] INFO
 org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job:
JID:0.700: {1=[TAID:TID:ANID:ODID:2:0:0:0, TAID:TID:ANID:ODID:2:0:1:0]}

Best,
Taewoo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message