Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 9156 invoked from network); 18 Jun 2009 20:27:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Jun 2009 20:27:17 -0000 Received: (qmail 56366 invoked by uid 500); 18 Jun 2009 20:27:26 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 56291 invoked by uid 500); 18 Jun 2009 20:27:25 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 56281 invoked by uid 99); 18 Jun 2009 20:27:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2009 20:27:25 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.132.245] (HELO an-out-0708.google.com) (209.85.132.245) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2009 20:27:15 +0000 Received: by an-out-0708.google.com with SMTP id c2so659549anc.29 for ; Thu, 18 Jun 2009 13:26:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.251.10 with SMTP id y10mr2682617anh.38.1245356814243; Thu, 18 Jun 2009 13:26:54 -0700 (PDT) In-Reply-To: <24099585.post@talk.nabble.com> References: <24099585.post@talk.nabble.com> From: Aaron Kimball Date: Thu, 18 Jun 2009 13:26:34 -0700 Message-ID: Subject: Re: HDFS is not loading evenly across all nodes. To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00163691fe65881666046ca53a44 X-Virus-Checked: Checked by ClamAV on apache.org --00163691fe65881666046ca53a44 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Did you run the dfs put commands from the master node? If you're inserting into HDFS from a machine running a DataNode, the local datanode will always be chosen as one of the three replica targets. For more balanced loading, you should use an off-cluster machine as the point of origin. If you experience uneven block distribution, you should also periodically rebalance your cluster by running bin/start-balancer.sh every so often. It will work in the background to move blocks from heavily-laden nodes to underutilized ones. - Aaron On Thu, Jun 18, 2009 at 12:57 PM, openresearch < Qiming.He@openresearchinc.com> wrote: > > Hi all > > I "dfs put" a large dataset onto a 10-node cluster. > > When I observe the Hadoop progress (via web:50070) and each local file > system (via df -k), > I notice that my master node is hit 5-10 times harder than others, so hard > drive is get full quicker than others. Last night load, it actually crash > when hard drive was full. > > To my understand, data should wrap around all nodes evenly (in a > round-robin fashion using 64M as a unit). > > Is it expected behavior of Hadoop? Can anyone suggest a good > troubleshooting > way? > > Thanks > > > -- > View this message in context: > http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > --00163691fe65881666046ca53a44--