spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jepson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
Date Tue, 22 Aug 2017 07:48:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136400#comment-16136400
] 

Jepson edited comment on SPARK-21733 at 8/22/17 7:47 AM:
---------------------------------------------------------

[~jerryshao]  Thanks for you quick reply.
 The spark streaming with kafka scala code :

      scc.start()
      scc.awaitTermination()

*1.And I set the parameters:*
--driver-memory 4g     \
--executor-memory 4g     \
--executor-cores  4   \
--num-executors 4 \
--conf "spark.yarn.am.memory=1024m" \
--conf "spark.yarn.am.memoryOverhead=1024m" \
--conf "spark.yarn.driver.memoryOverhead=4096m" \
--conf "spark.yarn.executor.memoryOverhead=4096m" \

*2.The error again.*



was (Author: 1028344078@qq.com):
[~jerryshao]  Thanks for you quick reply.
 The spark streaming with kafka scala code :

      scc.start()
      scc.awaitTermination()

*1.And I set the parameters:*
--driver-memory 4g     \
--executor-memory 4g     \
--executor-cores  4   \
--num-executors 4 \
--conf "spark.yarn.am.memory=1024m" \
--conf "spark.yarn.am.memoryOverhead=1024m" \
--conf "spark.yarn.driver.memoryOverhead=4096m" \
--conf "spark.yarn.executor.memoryOverhead=4096m" \

*2.The error again:*
2017-08-22 15:06:32,082 *INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 5382 for container-id container_e65_1503383442059_0002_01_000006:
573.9 MB of 8 GB physical memory used; 6.2 GB of 40 GB virtual memory used*
2017-08-22 15:06:33,026 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 15:06:33,026 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 15:06:33,026 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3069
2017-08-22 15:06:33,027 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3069
2017-08-22 15:06:33,027 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 1ms
2017-08-22 15:06:34,028 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 15:06:34,028 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 15:06:34,028 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3070
2017-08-22 15:06:34,029 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3070
2017-08-22 15:06:34,029 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 1ms
2017-08-22 15:06:35,030 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 15:06:35,030 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 15:06:35,030 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3071
2017-08-22 15:06:35,031 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3071
2017-08-22 15:06:35,031 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 1ms
2017-08-22 15:06:35,084 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Current ProcessTree list : [ 5382 ]
2017-08-22 15:06:35,084 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Constructing ProcessTree for : PID = 5382 ContainerId = container_e65_1503383442059_0002_01_000006
2017-08-22 15:06:35,092 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 5382 5532
]
2017-08-22 15:06:35,092 *INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 5382 for container-id container_e65_1503383442059_0002_01_000006:
573.9 MB of 8 GB physical memory used; 6.2 GB of 40 GB virtual memory used*
2017-08-22 15:06:36,031 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 15:06:36,032 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 15:06:36,032 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3072
2017-08-22 15:06:36,032 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3072
2017-08-22 15:06:36,033 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 1ms
2017-08-22 15:06:37,037 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 15:06:37,037 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 15:06:37,037 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3073
2017-08-22 15:06:37,038 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3073
2017-08-22 15:06:37,038 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 1ms
2017-08-22 15:06:37,564 DEBUG org.apache.hadoop.ipc.Server: IPC Server idle connection scanner
for port 8040: task running
2017-08-22 15:06:37,691 DEBUG org.apache.hadoop.ipc.Server: IPC Server idle connection scanner
for port 8041: task running
2017-08-22 15:06:38,040 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true, 
2017-08-22 15:06:38,040 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006,
State: RUNNING, Diagnostics: , ExitStatus: -1000, ]]
2017-08-22 15:06:38,040 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3074
2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection
to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3074
2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat
took 1ms
2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.CMgrCompletedContainersEvent.EventType: FINISH_CONTAINERS
2017-08-22 15:06:38,041* DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType:
KILL_CONTAINER*
2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e65_1503383442059_0002_01_000006 of type KILL_CONTAINER
2017-08-22 15:06:38,042 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e65_1503383442059_0002_01_000006 transitioned from RUNNING to KILLING
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
CLEANUP_CONTAINER
2017-08-22 15:06:38,042 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_e65_1503383442059_0002_01_000006
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Marking container container_e65_1503383442059_0002_01_000006 as inactive
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Getting pid for container container_e65_1503383442059_0002_01_000006 to kill from pid file
/yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Accessing pid for container container_e65_1503383442059_0002_01_000006 from pid file /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.ProcessIdFileReader:
Accessing pid from pid file /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.ProcessIdFileReader:
Got pid 5382 from path /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Got pid 5382 for container container_e65_1503383442059_0002_01_000006
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Sending signal to pid 5382 as user hdfs for container container_e65_1503383442059_0002_01_000006
2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Sending signal 15 to pid 5382 as user hdfs
2017-08-22 15:06:38,046 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Sent signal SIGTERM to pid 5382 as user hdfs for container container_e65_1503383442059_0002_01_000006,
result=success
2017-08-22 15:06:38,046 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction
as:yarn (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:338)
2017-08-22 15:06:38,048 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_e65_1503383442059_0002_01_000006 is : 143
2017-08-22 15:06:38,048 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e65_1503383442059_0002_01_000006 of type UPDATE_DIAGNOSTICS_MSG
2017-08-22 15:06:38,048 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container container_e65_1503383442059_0002_01_000006 completed with exit code 143
2017-08-22 15:06:38,067 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType:
CONTAINER_KILLED_ON_REQUEST
2017-08-22 15:06:38,067 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_e65_1503383442059_0002_01_000006 of type CONTAINER_KILLED_ON_REQUEST
2017-08-22 15:06:38,067 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e65_1503383442059_0002_01_000006 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2017-08-22 15:06:38,067 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the
event org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ContainerLocalizationCleanupEvent.EventType:
CLEANUP_CONTAINER_RESOURCES


> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -----------------------------------------------------------------
>
>                 Key: SPARK-21733
>                 URL: https://issues.apache.org/jira/browse/SPARK-21733
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.1.1
>         Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>            Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored as bytes
in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 8003 took
11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as values in memory
(estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the same as ending
offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 (TID 64178).
1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast variable
8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored as bytes
in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 8004 took
8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as values in memory
(estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the same as ending
offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 (TID 64186).
1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message