Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1FFB7172DD for ; Thu, 20 Aug 2015 06:04:59 +0000 (UTC) Received: (qmail 12980 invoked by uid 500); 20 Aug 2015 06:04:53 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 12842 invoked by uid 500); 20 Aug 2015 06:04:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 12825 invoked by uid 99); 20 Aug 2015 06:04:52 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2015 06:04:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 50C10182107 for ; Thu, 20 Aug 2015 06:04:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.101 X-Spam-Level: X-Spam-Status: No, score=-0.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id zub8IRw9MKpo for ; Thu, 20 Aug 2015 06:04:43 +0000 (UTC) Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 50C4C24D07 for ; Thu, 20 Aug 2015 06:04:42 +0000 (UTC) Received: by oiew67 with SMTP id w67so16970223oie.2 for ; Wed, 19 Aug 2015 23:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=68AoW5OBBqpowJLV1tbH4XsfL0L6IbdDyZw2isgCuk0=; b=MrU97CpjzRoWMAuH93PZshx5Bi923zsZTeynbIYOGXUXMpdW4AOpRQgfOTze7EF8ol L3Bav/nD4NEtAOBr2KZY0eoix2tVr60s66s0GvTG7EsWvAwsqNCF4LXHhb3YcKRekuu6 fhTaHjOf5BjaOyCqsk2wYmEAGX4u+mjqdxYR0+utIajmWlr8o5CXYEbJFEIyrpLDYlNl C4/jqhWy56RIoP44SO2vanIIzqNlI9xdtFVhFX1ID1ETY3c/Wt3E/spo5rCU8dGFWRkt xetCdTBXPDdxxjwai3I1YqY4ED+EVJ3m+c0PFJQuFOUk46bQczTfW20piuLIMPK4DIhT R/8A== MIME-Version: 1.0 X-Received: by 10.202.129.70 with SMTP id c67mr1221306oid.42.1440050681221; Wed, 19 Aug 2015 23:04:41 -0700 (PDT) Received: by 10.182.213.202 with HTTP; Wed, 19 Aug 2015 23:04:41 -0700 (PDT) In-Reply-To: References: Date: Thu, 20 Aug 2015 11:34:41 +0530 Message-ID: Subject: Re: App Master takes ~30min to re-schedule task attempts. From: Susheel Kumar Gadalay To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Change mapreduce.reduce.shuffle.connect.timeout, mapreduce.reduce.shuffle.read.timeout. By default they are 180000. On 8/20/15, manoj wrote: > Hello all, > > I'm running Apache2.6.0. > I'm trying to remove a node from a Hadoop Cluster and the add it back. > The taskattempts on the node which was removed are rescheduled only after > 30min. > > During this 30min period looks like the App Master is trying to connect( > check the log below ) the same node which was removed and after about 30min > it reschedules those taskAttempts from the lost node and eventually the job > succeeds. > > how can I reduce the 30min wait time? > > ..... > ...... > 2015-08-14 11:25:21,662 INFO [ContainerLauncher #7] > org.apache.hadoop.ipc.Client: Retrying connect to server: > host172/XX.XX.XX.XX:36158. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > ...... > ...... > > Thanks > --Manoj Kumar M >