spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-18976) in standlone mode,executor expired by HeartbeanReceiver that still take up cores but no tasks assigned to
Date Tue, 27 Dec 2016 09:32:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-18976.
-------------------------------
    Resolution: Duplicate

> in standlone mode,executor expired by HeartbeanReceiver that still take up cores but
no tasks assigned to 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18976
>                 URL: https://issues.apache.org/jira/browse/SPARK-18976
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy
>    Affects Versions: 1.6.1
>         Environment: jdk1.8.0_77 Red Hat 4.4.7-11
>            Reporter: liujianhui
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> h2. scene
> when executor expired by HeartbeatReceiver in driver, driver will mark that executor
as not live, task scheduler will not assign tasks to that executor, but that executor's status
will always be running and take up cores, the executor 18 was expired and no task running,
the task time far less than the normal executor 142, but in app page, the executor is running
> !screenshot-1.png!
> !screenshot-2.png!
> !screenshot-3.png!
> h2.process:
> # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the executor
timeout
> # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so that executor
will marked as dead, it will not scheduled as offer since now because it in executorsPendingToRemove
> # status of that executor is running because the CoarseGrainedExecutorBackend processor
is also exist and it register block manager to the driver every 10s, log as 
> {code}
> 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master. 
> {code}
> h2. resolve 
> when the register times exceed some threshold(e.g. 10), the executor should exit as zero




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message