Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
MIME-Version: 1.0
Date: Wed, 19 Aug 2015 10:40:41 -0700
Message-ID: 
 <CAHNHubZNvp9U4J56hF=Wn8UYAPzPSfeM7Nu-60DFuXZ2rT2U3Q@mail.gmail.com>
Subject: App Master takes ~30min to re-schedule task attempts.
From: manoj <manojm.321@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c303a697e5f2051dad88d8

--001a11c303a697e5f2051dad88d8
Content-Type: text/plain; charset=UTF-8

Hello all,

I'm running Apache2.6.0.
I'm trying to remove a node from a Hadoop Cluster and the add it back.
The taskattempts on the node which was removed are rescheduled only after
30min.

During this 30min period looks like the App Master is trying to connect(
check the log below ) the same node which was removed and after about 30min
it reschedules those taskAttempts from the lost node and eventually the job
succeeds.

how can I reduce the 30min wait time?

.....
......
2015-08-14 11:25:21,662 INFO [ContainerLauncher #7]
org.apache.hadoop.ipc.Client: Retrying connect to server:
host172/XX.XX.XX.XX:36158. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
......
......

-Thanks
--Manoj Kumar M

--001a11c303a697e5f2051dad88d8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><span style=3D"font-size:13px">Hello all,</span><div style=
=3D"font-size:13px"><br></div><div style=3D"font-size:13px">I&#39;m running=
 Apache2.6.0.</div><div style=3D"font-size:13px">I&#39;m trying to remove a=
 node from a Hadoop Cluster and the add it back.</div><div style=3D"font-si=
ze:13px">The taskattempts on the node which was removed are rescheduled onl=
y after 30min.</div><div style=3D"font-size:13px"><br></div><div style=3D"f=
ont-size:13px">During this 30min period looks like the App Master is trying=
 to connect( check the log below ) the same node which was removed and afte=
r about 30min it reschedules those taskAttempts from the lost node and even=
tually the job succeeds.</div><div style=3D"font-size:13px"><br></div><div =
style=3D"font-size:13px">how can I reduce the 30min wait time?</div><div st=
yle=3D"font-size:13px"><div><br></div><div><pre style=3D"white-space:pre-wr=
ap;margin-top:0px;padding:5px;border:0px;overflow:auto;width:auto;max-heigh=
t:600px;background-color:rgb(238,238,238);font-family:Consolas,Menlo,Monaco=
,&#39;Lucida Console&#39;,&#39;Liberation Mono&#39;,&#39;DejaVu Sans Mono&#=
39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courier New&#39;,monospace,sans=
-serif;word-wrap:normal"><code style=3D"margin:0px;padding:0px;border:0px;f=
ont-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation M=
ono&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39=
;Courier New&#39;,monospace,sans-serif;white-space:inherit">.....
......
2015-08-14 11:25:21,662 INFO [ContainerLauncher #7] org.apache.hadoop.ipc.C=
lient: Retrying connect to server: host172/XX.XX.XX.XX:36158. Already tried=
 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=
=3D10, sleepTime=3D1000 MILLISECONDS)
......
......</code></pre></div>-Thanks<br><div><div dir=3D"ltr">--Manoj Kumar M</=
div></div></div>
</div>

--001a11c303a697e5f2051dad88d8--