spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hustfxj (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
Date Mon, 06 Mar 2017 05:21:32 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

hustfxj updated SPARK-19831:
----------------------------
    Description: 
Cleaning the application may cost much time at worker, then it will block that  the worker
send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat
from a worker  is blocked  by the message *ApplicationFinished*,  master will think the worker
is dead. If the worker has a driver, the driver will be scheduled by master again. So I think
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'.
Thus it won't block other rpc messages like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any other rpc message
may block the rpc channel. It had better send the heartbeat master at an asynchronous timing
thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  the worker
send heartbeats master and rpc messages because the worker is extend *ThreadSafeRpcEndpoint*.
If the heartbeat from a worker  is blocked  by the message *ApplicationFinished*,  master
will think the worker is dead. If the worker has a driver, the driver will be scheduled by
master again. So I think it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'.
Thus it won't block other rpc messages like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any other rpc message
may block the rpc channel. It had better send the heartbeat master at an asynchronous timing
thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-19831
>                 URL: https://issues.apache.org/jira/browse/SPARK-19831
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block that  the worker
send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat
from a worker  is blocked  by the message *ApplicationFinished*,  master will think the worker
is dead. If the worker has a driver, the driver will be scheduled by master again. So I think
it is the bug on spark. I can solve this problem by followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous thread like
'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat;
> 2. It had better not send the heartbeat master by rpc channel. Because any other rpc
message may block the rpc channel. It had better send the heartbeat master at an asynchronous
timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message