spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Semenov <vadim.seme...@datadoghq.com>
Subject Re: Task failures and other problems
Date Thu, 09 Nov 2017 22:38:53 GMT
Probably not Oracle but Cloudera 🙂

Jan, I think your DataNodes might be overloaded, I'd suggest reducing
`spark.executor.cores` if you run executors alongside DataNodes, so the
DataNode process would get some resources.

The other thing you can do is to increase `dfs.client.socket-timeout` in
hadoopConf,
I see that it's set to 120000 in your case right now

On Thu, Nov 9, 2017 at 4:28 PM, Jan-Hendrik Zab <zab@l3s.de> wrote:

>
> Jörn Franke <jornfranke@gmail.com> writes:
>
> > Maybe contact Oracle support?
>
> Something like that would be the last option I guess, university money
> is usually hard to come by for such things.
>
> > Do you have maybe accidentally configured some firewall rules? Routing
> > issues? Maybe only one of the nodes...
>
> All systems are in the same /16, the nodes don't even have a firewall
> and the two masters allow everything from the nodes and masters via the
> infiniband devices.
>
> And as I said, mapred jobs work fine and I haven't seen one network
> problem so far except for these messages.
>
> Best,
>         -jhz
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message