hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Telles Nobrega <tellesnobr...@gmail.com>
Subject Re: Max Connect retries
Date Mon, 09 Feb 2015 18:25:18 GMT
It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator

On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle <daemeonr@gmail.com> wrote:

> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <tellesnobrega@gmail.com>
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xgong@hortonworks.com> wrote:
>>
>>>  That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>>  Thanks
>>>
>>>  Xuan Gong
>>>
>>>   From: Telles Nobrega <tellesnobrega@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> Subject: Max Connect retries
>>>
>>>   Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>>  org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911.
Already tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents
10000
>>>
>>> Is this the expected behaviour? should I change max retries to a lower values?
if so, which  config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>

Mime
View raw message