Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9314F9948 for ; Wed, 22 May 2013 00:48:28 +0000 (UTC) Received: (qmail 2827 invoked by uid 500); 22 May 2013 00:48:24 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 2555 invoked by uid 500); 22 May 2013 00:48:24 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 2546 invoked by uid 99); 22 May 2013 00:48:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 00:48:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of silvianhadoop@gmail.com designates 209.85.128.170 as permitted sender) Received: from [209.85.128.170] (HELO mail-ve0-f170.google.com) (209.85.128.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 00:48:17 +0000 Received: by mail-ve0-f170.google.com with SMTP id 15so1010575vea.1 for ; Tue, 21 May 2013 17:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=O6EuL6DPhKBrkFdab5vmi0W2JNA1/u0MmOJj87ne/hk=; b=FixfloRkjYgcuICsFd3Ki1sxSL+rueGfu9atA+M6cDdEKUgcUruUDFfHPCtoeJTmyN F76hbEn9DjbKL2i4ulPmQiukiT+UUDByKD7j/RkKUPSfK3YbZaX9N0MxmEis2iV7/Yph 14Z86LKeEeCW6fEiiLPjcz0zbo8Tqn3JUHqNWz0gyH/BdifdgPcfcinAIEitrh1HvHEW 7AtWmJkp5c/aDzOOdZTd34vd+ABLC9lpLP2V7Bdg4sv72qWU4zugZysNvqZ+n/8W5sqx FPR7JsMNAZ2P3WHSEE6/Gamd/HjjFSxqNIiw5VfyHwFJfWDTY9LBzDyZv4tzA3J9D9P8 aHxA== MIME-Version: 1.0 X-Received: by 10.52.176.65 with SMTP id cg1mr1656968vdc.1.1369183677061; Tue, 21 May 2013 17:47:57 -0700 (PDT) Received: by 10.58.54.38 with HTTP; Tue, 21 May 2013 17:47:56 -0700 (PDT) Date: Tue, 21 May 2013 17:47:56 -0700 Message-ID: Subject: heartbeat and timeout question From: Patai Sangbutsarakum To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307f382cb4596104dd43eabd X-Virus-Checked: Checked by ClamAV on apache.org --20cf307f382cb4596104dd43eabd Content-Type: text/plain; charset=ISO-8859-1 Hello Hadoopers, I am going to migrate production racks of datanodes/tasktrackers into new core switches. Rack awareness is in place for long time. I am looking for the way to mitigate recopying blocks of datanodes in the rack that is being move (when it become dead nodes), and shifting of running tasks in those tasktrackers to other machines. One approach, that i can thinking of is playing with heartbeat of both datanode and tasktracker to make it extra long like 15 minutes, so namenode and jobtracker are more forgiving to those nodes (that is being moved). however, network operation that need to be done to flip the switch should be around couple minutes per rack. Possible alternatives are more than welcome. Thanks in advnace, P btw, the cluster is on cdh3u4 (0.20 branch) --20cf307f382cb4596104dd43eabd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hello Hadoopers,

I am going to migrate pro= duction racks of datanodes/tasktrackers into new core switches. Rack awaren= ess is in place for long time. I am looking for the way to mitigate recopyi= ng blocks of datanodes in the rack that is being move (when it become dead = nodes), and shifting of running tasks in those tasktrackers to other machin= es.

One approach, that i can thinking of is playing with h= eartbeat of both datanode and tasktracker to make it extra long like 15 min= utes, so namenode and jobtracker are more forgiving to those nodes (that is= being moved). however, network operation that need to be done to flip the = switch should be around couple minutes per rack.

Possible alternatives are more than welcome= .

Thanks in advnace,

btw, the cluster is on cdh3u4 (0.20 b= ranch)

<= br>

--20cf307f382cb4596104dd43eabd--