Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFBDF18380 for ; Wed, 19 Aug 2015 17:40:54 +0000 (UTC) Received: (qmail 77084 invoked by uid 500); 19 Aug 2015 17:40:44 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 76951 invoked by uid 500); 19 Aug 2015 17:40:44 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 76936 invoked by uid 99); 19 Aug 2015 17:40:43 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Aug 2015 17:40:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 48D3ADF3F1 for ; Wed, 19 Aug 2015 17:40:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.129 X-Spam-Level: *** X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id YGsOs15HP5ez for ; Wed, 19 Aug 2015 17:40:42 +0000 (UTC) Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 4F35342B11 for ; Wed, 19 Aug 2015 17:40:42 +0000 (UTC) Received: by obkg7 with SMTP id g7so10589832obk.3 for ; Wed, 19 Aug 2015 10:40:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=+pCq4399f9suWdsCwxIab9rQlYXCnRs2k4eOWkN2kOU=; b=nP3JKdrNGoUGqfAkJEx9r42pRupMyhWQTE8T3EuRZMPkq+oq2Py74p2+VRlVJDU2MW FKcImjvHhKetwpnSVAjNCEddP/nZxYRxXUZwTn+OYKeR72mQ0pX/aOEmUkyznshBHHOf s+CfXJb+erLBnV72ypwpXD9kDhFum8evBo/lnUyMlrDvEsvSiCTGt0JDMOwKHIadqR3X VP/ErxvaggEOVMcOp6K8QNd/D4+RZcwHrFVwvW60nRRC3xFCAtiBbdv0HZpWlVWi5oJp q1Uhb0nKfaqkqxH+VVQVutahaoqbReeE76+RQn1RaO4ZxbmRC0rYX9J0mK/AQv2Xa6Fm ef1w== MIME-Version: 1.0 X-Received: by 10.182.213.227 with SMTP id nv3mr11610277obc.10.1440006041755; Wed, 19 Aug 2015 10:40:41 -0700 (PDT) Received: by 10.182.22.170 with HTTP; Wed, 19 Aug 2015 10:40:41 -0700 (PDT) Date: Wed, 19 Aug 2015 10:40:41 -0700 Message-ID: Subject: App Master takes ~30min to re-schedule task attempts. From: manoj To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c303a697e5f2051dad88d8 --001a11c303a697e5f2051dad88d8 Content-Type: text/plain; charset=UTF-8 Hello all, I'm running Apache2.6.0. I'm trying to remove a node from a Hadoop Cluster and the add it back. The taskattempts on the node which was removed are rescheduled only after 30min. During this 30min period looks like the App Master is trying to connect( check the log below ) the same node which was removed and after about 30min it reschedules those taskAttempts from the lost node and eventually the job succeeds. how can I reduce the 30min wait time? ..... ...... 2015-08-14 11:25:21,662 INFO [ContainerLauncher #7] org.apache.hadoop.ipc.Client: Retrying connect to server: host172/XX.XX.XX.XX:36158. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ...... ...... -Thanks --Manoj Kumar M --001a11c303a697e5f2051dad88d8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello all,

I'm running= Apache2.6.0.
I'm trying to remove a= node from a Hadoop Cluster and the add it back.
The taskattempts on the node which was removed are rescheduled onl= y after 30min.

During this 30min period looks like the App Master is trying= to connect( check the log below ) the same node which was removed and afte= r about 30min it reschedules those taskAttempts from the lost node and even= tually the job succeeds.

how can I reduce the 30min wait time?

.....
......
2015-08-14 11:25:21,662 INFO [ContainerLauncher #7] org.apache.hadoop.ipc.C=
lient: Retrying connect to server: host172/XX.XX.XX.XX:36158. Already tried=
 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=
=3D10, sleepTime=3D1000 MILLISECONDS)
......
......
-Thanks
--Manoj Kumar M
--001a11c303a697e5f2051dad88d8--