spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: All master are unreponsive issue
Date Sun, 05 Jul 2015 00:11:30 GMT
Currently the number of retries is hardcoded.

You may want to open a JIRA which makes the retry count configurable.

Cheers

On Thu, Jul 2, 2015 at 8:35 PM, <luohui20001@sina.com> wrote:

> Hi there,
>
>        i check the source code and found that in
> org.apache.spark.deploy.client.AppClient, there is a parameter tells(line
> 52):
>
>   val REGISTRATION_TIMEOUT = 20.seconds
>
>   val REGISTRATION_RETRIES = 3
>
> As I know If I wanna increase the retry times, must I modify this
> value,rebuild the entire Spark project and then redeply spark cluster with
> my modified version?
>
> Or is there a better way to solve this issue?
>
> Thanks.
>
>
>
>
> --------------------------------
>
> Thanks&amp;Best regards!
> San.Luo
>
> ----- 原始邮件 -----
> 发件人:<luohui20001@sina.com>
> 收件人:"user" <user@spark.apache.org>
> 主题:All master are unreponsive issue
> 日期:2015年07月02日 17点31分
>
> Hi there:
>
>       I got an problem that "Application has been killed.Reason:All
> masters are unresponsive!Giving up." I check the network I/O and found
> sometimes it is really high when running my app. Pls refer to the attached
> pic for more info.
>
> I also checked
> http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html,
> and set SPARK_LOCAL_IP in every node's spark-env.sh of my spark cluster.
> However it does not benifit in solving this problem.
>
> I am not sure if this parameter is correctly set,my setting is like this:
>
> On node1:
>
> export SPARK_LOCAL_IP={node1's IP}
>
> On node2:
>
> export SPARK_LOCAL_IP={node2's IP}
>
> ......
>
>
>
> BTW,I guess that the akka will retry 3 times when communicate between
> master and slave, it is possible to increase the akka retries?
>
>
> And except expand the network bandwidth, is there another way to solve
> this problem?
>
>
> thanks for any coming ideas.
>
> --------------------------------
>
> Thanks&amp;Best regards!
> San.Luo
>

Mime
View raw message