Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8084210620 for ; Mon, 9 Feb 2015 10:50:49 +0000 (UTC) Received: (qmail 40247 invoked by uid 500); 9 Feb 2015 10:50:41 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 40118 invoked by uid 500); 9 Feb 2015 10:50:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 40107 invoked by uid 99); 9 Feb 2015 10:50:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2015 10:50:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tellesnobrega@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2015 10:50:15 +0000 Received: by mail-ob0-f176.google.com with SMTP id wo20so24427240obc.7 for ; Mon, 09 Feb 2015 02:49:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:from:date:message-id:subject:to :content-type; bh=rjZWlVFWjt1JO9GWjb4Pl0s1xGH+/gkIzVzaRgoZfLI=; b=Vevx/or9djJ0Tcra36z2IE6ifdVCHLNbZEfwvjlC78f4Bn6LAKTOJn2Rq1ZDoavC8f 6yAZUIPWIi4aoEnHie19G26xdbsDzq5WfehT7oY+iBUUkEusgvcdO039QbMS8CVUQZyS I35r4bF8Ifste7tV0x1KGw2cywJqXUKkJbFD0Iwx5nSRwbxV9LgSMpfTbOJ4kgEPxcrD tIgIUF10ZGI8VGZD1xddyYV9EbMr4KUaDpT0d+S2ZKOrR1WEg6wmoQIrcnJbu5C3Lqs/ 694nHA3WC/PMCYtcQuHLqnF3AdPRJPLLouTmFbkzHFxueUOglrO5Kyw6BkhFapbc0vsr iXjg== X-Received: by 10.202.85.17 with SMTP id j17mr4253134oib.65.1423478969120; Mon, 09 Feb 2015 02:49:29 -0800 (PST) MIME-Version: 1.0 References: From: Telles Nobrega Date: Mon, 09 Feb 2015 10:49:28 +0000 Message-ID: Subject: Re: Max Connect retries To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a113d2e2c4cbc1f050ea58609 X-Virus-Checked: Checked by ClamAV on apache.org --001a113d2e2c4cbc1f050ea58609 Content-Type: text/plain; charset=UTF-8 Thanks On Mon Feb 09 2015 at 01:43:24 Xuan Gong wrote: > That is for client connect retry in ipc level. > > You can decrease the max.retries by configuring > > ipc.client.connect.max.retries.on.timeouts > > in core-site.xml > > > Thanks > > Xuan Gong > > From: Telles Nobrega > Reply-To: "user@hadoop.apache.org" > Date: Saturday, February 7, 2015 at 8:37 PM > To: "user@hadoop.apache.org" > Subject: Max Connect retries > > Hi, I changed my cluster config so a failed nodemanager can be detected > in about 30 seconds. When I'm running a wordcount the reduce gets stuck in > 25% for a quite while and logs show nodes trying to connect to the failed > node: > > org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45 > 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000 > > Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that? > > Thanks > > > --001a113d2e2c4cbc1f050ea58609 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks

On Mon Feb = 09 2015 at 01:43:24 Xuan Gong <= xgong@hortonworks.com> wrote:

That is for client connect retry = in ipc level.=C2=A0

You can decrease the max.retries = by configuring=C2=A0

ipc.client.connect.max.retries.on= .timeouts

in core-site.xml



Thanks

Xuan Gong

From: Telles Nobrega <tellesnobrega@gmail.com<= /a>>
Reply-To: "
user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Saturday, February 7, 2015 at= 8:37 PM
To: "user@hadoop.apache.org" <user@hadoop.apache= .org>
Subject: Max Connect retries

Hi, I changed my cluster config so a failed nodemanager ca= n be detected in about 30 seconds. When I'm running a wordcount the red= uce gets stuck in 25% for a quite while and logs show nodes trying to conne= ct to the failed node:

org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telle=
s-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=
=3D45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.ha=
doop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attem=
pt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
Is this the expec=
ted behaviour? should I change max retries to a lower values? if so, which =
=C2=A0config is that?
Thanks

--001a113d2e2c4cbc1f050ea58609--