spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jepson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
Date Tue, 22 Aug 2017 04:30:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136260#comment-16136260
] 

Jepson edited comment on SPARK-21733 at 8/22/17 4:29 AM:
---------------------------------------------------------

*The nodemanager log detail:*


{code:java}
2017-08-22 11:20:07,984 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 17040
16747 ]
2017-08-22 11:20:07,984 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 16747 for container-id container_e56_1503371613444_0001_01_000002:
586.8 MB of 3 GB physical memory used; 4.5 GB of 6.3 GB virtual memory used
2017-08-22 11:20:07,984 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Constructing ProcessTree for : PID = 16766 ContainerId = container_e56_1503371613444_0001_01_000003
2017-08-22 11:20:07,992 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 17066
16766 ]
2017-08-22 11:20:07,992 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 16766 for container-id container_e56_1503371613444_0001_01_000003:
580.4 MB of 3 GB physical memory used; 4.6 GB of 6.3 GB virtual memory used
2017-08-22 11:20:08,716 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 3 container statuses: [ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000001,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000002,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000003,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 11:20:08,717 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Call -> hadoop37.jiuye/192.168.17.37:8031:
nodeHeartbeat {node_status { node_id { host: "hadoop44.jiuye" port: 8041 } response_id: 389
containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155457 } state: C_RUNNING diagnostics: "" exit_status:
-1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155458 } state: C_RUNNING diagnostics: "" exit_status:
-1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155459 } state: C_RUNNING diagnostics: "" exit_status:
-1000 } nodeHealthStatus { is_node_healthy: true health_report: "" last_health_report_time:
1503371969299 } } last_known_container_token_master_key { key_id: -966413074 bytes: "a\021&\346gs\031n"
} last_known_nm_token_master_key { key_id: -1126930838 bytes: "$j@\322\331dr`" }}
2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #851
2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #851
2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 3ms
2017-08-22 11:20:08,720 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Response <-
hadoop37.jiuye/192.168.17.37:8031: nodeHeartbeat {response_id: 390 nodeAction: NORMAL containers_to_cleanup
{ app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1
} id: 61572651155458 } containers_to_cleanup { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155459 } nextHeartBeatInterval: 1000}
2017-08-22 11:20:08,721 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.CMgrCompletedContainersEvent.EventType: FINISH_CONTAINERS
*{color:#f6c342}2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType:
KILL_CONTAINER{color}*
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e56_1503371613444_0001_01_000002 of type KILL_CONTAINER
2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e56_1503371613444_0001_01_000002 transitioned from RUNNING to KILLING
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType:
KILL_CONTAINER
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e56_1503371613444_0001_01_000003 of type KILL_CONTAINER
2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e56_1503371613444_0001_01_000003 transitioned from RUNNING to KILLING
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
CLEANUP_CONTAINER
2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_e56_1503371613444_0001_01_000002
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Marking container container_e56_1503371613444_0001_01_000002 as* inactive*
{code}



was (Author: 1028344078@qq.com):
*The nodemanager log detail:*


{code:java}
2017-08-22 11:20:07,984 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Constructing ProcessTree for : PID = 16766 ContainerId = container_e56_1503371613444_0001_01_000003
2017-08-22 11:20:07,992 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 17066
16766 ]
2017-08-22 11:20:07,992 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 16766 for container-id container_e56_1503371613444_0001_01_000003:
580.4 MB of 3 GB physical memory used; 4.6 GB of 6.3 GB virtual memory used
2017-08-22 11:20:08,716 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 3 container statuses: [ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000001,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000002,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000003,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 11:20:08,717 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Call -> hadoop37.jiuye/192.168.17.37:8031:
nodeHeartbeat {node_status { node_id { host: "hadoop44.jiuye" port: 8041 } response_id: 389
containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155457 } state: C_RUNNING diagnostics: "" exit_status:
-1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155458 } state: C_RUNNING diagnostics: "" exit_status:
-1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155459 } state: C_RUNNING diagnostics: "" exit_status:
-1000 } nodeHealthStatus { is_node_healthy: true health_report: "" last_health_report_time:
1503371969299 } } last_known_container_token_master_key { key_id: -966413074 bytes: "a\021&\346gs\031n"
} last_known_nm_token_master_key { key_id: -1126930838 bytes: "$j@\322\331dr`" }}
2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #851
2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #851
2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 3ms
2017-08-22 11:20:08,720 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Response <-
hadoop37.jiuye/192.168.17.37:8031: nodeHeartbeat {response_id: 390 nodeAction: NORMAL containers_to_cleanup
{ app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1
} id: 61572651155458 } containers_to_cleanup { app_attempt_id { application_id { id: 1 cluster_timestamp:
1503371613444 } attemptId: 1 } id: 61572651155459 } nextHeartBeatInterval: 1000}
2017-08-22 11:20:08,721 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.CMgrCompletedContainersEvent.EventType: FINISH_CONTAINERS
2017-08-22 11:20:08,722 {color:#59afe1}DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType:
KILL_CONTAINER{color}
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e56_1503371613444_0001_01_000002 of type KILL_CONTAINER
2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e56_1503371613444_0001_01_000002 transitioned from RUNNING to KILLING
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType:
KILL_CONTAINER
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e56_1503371613444_0001_01_000003 of type KILL_CONTAINER
2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e56_1503371613444_0001_01_000003 transitioned from RUNNING to KILLING
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
CLEANUP_CONTAINER
2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_e56_1503371613444_0001_01_000002
2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Marking container container_e56_1503371613444_0001_01_000002 as inactive
{code}


> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -----------------------------------------------------------------
>
>                 Key: SPARK-21733
>                 URL: https://issues.apache.org/jira/browse/SPARK-21733
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.1.1
>         Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>            Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored as bytes
in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 8003 took
11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as values in memory
(estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the same as ending
offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 (TID 64178).
1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast variable
8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored as bytes
in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 8004 took
8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as values in memory
(estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the same as ending
offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 (TID 64186).
1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message