Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 51209 invoked from network); 16 May 2007 17:08:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 May 2007 17:08:49 -0000 Received: (qmail 6793 invoked by uid 500); 16 May 2007 17:08:53 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 6770 invoked by uid 500); 16 May 2007 17:08:53 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 6745 invoked by uid 99); 16 May 2007 17:08:53 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2007 10:08:53 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [207.126.228.150] (HELO rsmtp2.corp.yahoo.com) (207.126.228.150) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2007 10:08:45 -0700 Received: from coatspeaklx (coatspeak-lx.corp.yahoo.com [10.72.110.26]) (authenticated bits=0) by rsmtp2.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l4GH87YL010899 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Wed, 16 May 2007 10:08:07 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=from:to:references:subject:date:message-id:mime-version: content-type:content-transfer-encoding:x-mailer:in-reply-to:x-mimeole:thread-index; b=jC9Ibndt0R8Y0rNq5Km39Oyy0ShtGXN00hY4Mm7z+7gqll0eWeBXmAiuu2/mGDkI From: "Dhruba Borthakur" To: References: <464B2D0A.3070906@dragonflymc.com> Subject: RE: Redistribute blocks evenly across DFS Date: Wed, 16 May 2007 10:08:07 -0700 Message-ID: <016001c797dc$c635ea90$639115ac@ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <464B2D0A.3070906@dragonflymc.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AceX1O5JvnDIKN+GQ/S0pWMKRa8MpAABuoTg X-Virus-Checked: Checked by ClamAV on apache.org I think HDFS always makes every effort to fill up most Datanodes uniformly. Anomaly arises when a large set of Datanodes are added to an existing cluster. In this case one possible approach would be to write a tool that does the following: 1. increase the replication factor of each file. This will automatically create a new replica in those nodes that have more free disk-space and lightly loaded. 2. then decrease the replication factor of the file to its original. The HDFS code will automatically select the replica on the most-full node to be deleted. (see Hadoop-1300) The tool could take a set of HDFS directories as input and then do the above two steps on all files (recursively) in the set of specified directories. Will this approach address your issue? Thanks, dhruba -----Original Message----- From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] Sent: Wednesday, May 16, 2007 9:11 AM To: hadoop-user@lucene.apache.org Subject: Redistribute blocks evenly across DFS Is there a way to redistribute blocks evenly across all DFS nodes. If not I would be happy to program a tool to do so but I would need a little guidance on howto. Dennis Kubes