Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 51778 invoked from network); 13 Jul 2010 16:52:26 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Jul 2010 16:52:26 -0000 Received: (qmail 25389 invoked by uid 500); 13 Jul 2010 16:52:23 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 25305 invoked by uid 500); 13 Jul 2010 16:52:22 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 25297 invoked by uid 99); 13 Jul 2010 16:52:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jul 2010 16:52:22 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of awittenauer@linkedin.com designates 69.28.149.25 as permitted sender) Received: from [69.28.149.25] (HELO esv4-mav03.corp.linkedin.com) (69.28.149.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jul 2010 16:52:16 +0000 DomainKey-Signature: s=prod; d=linkedin.com; c=nofws; q=dns; h=X-IronPort-AV:Received:From:To:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:Content-Type:Content-ID: Content-Transfer-Encoding:MIME-Version; b=K/KDCSLIcqRJXrCLp6cv3BGjlS2dC130XOo8KzqemOfGUBUNtRvGZwYF FbRPvVrzvLX4OhtbI8vxPBESixEn+11QRwj3pnpNXafhIvqp9+yrdgbQP qdVz4+a1t0Lu6lN; DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=linkedin.com; i=awittenauer@linkedin.com; q=dns/txt; s=proddkim; t=1279039936; x=1310575936; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Allen=20Wittenauer=20 |Subject:=20Re:=20using=20'fs=20-put'=20from=20datanode: =20all=20data=20written=20to=20that=20node's=0D=0A=20=09h dfs=20and=20not=20distributed|Date:=20Tue,=2013=20Jul=202 010=2016:51:54=20+0000|Message-ID:=20<02C90D97-1232-422A- AEA1-BDE4DB511FBE@linkedin.com>|To:=20""=20 |MIME-Version:=201.0|Content-Transfer-Encoding:=20quoted- printable|Content-ID:=20<5bb2865f-371c-48bd-8fd7-bad046bd 9119>|In-Reply-To:=20|References:=20=0D=0A=20<9 194A025-FBF1-4CBB-8BA6-280063F57886@me.com>=0D=0A=20; bh=YVkbJ9NovSUhRa3ZpoYuNz45w3sGBbR/vt+iDivbT4A=; b=fRf1g56CwbZqua8PHb5/WUGMurmsrHT/yaGx2bmmWW/QhTYQ9DD2wu6g jBB5kY88flFSBRvYaYMFdxiSWXekOjeQ7aeeq6fwI39wDD0/HS08d66zR me4xhG9Kje5KIgV; X-IronPort-AV: E=Sophos;i="4.55,196,1278313200"; d="scan'208";a="13588157" Received: from ESV4-EXC02.linkedin.biz ([fe80::4d74:48bd:e0bd:13ee]) by esv4-cas01.linkedin.biz ([172.18.46.140]) with mapi; Tue, 13 Jul 2010 09:51:55 -0700 From: Allen Wittenauer To: "" Subject: Re: using 'fs -put' from datanode: all data written to that node's hdfs and not distributed Thread-Topic: using 'fs -put' from datanode: all data written to that node's hdfs and not distributed Thread-Index: AQHLIqrtpK+cZ7QdikuqPyWpaVku/ZKvh2cA Date: Tue, 13 Jul 2010 16:51:54 +0000 Message-ID: <02C90D97-1232-422A-AEA1-BDE4DB511FBE@linkedin.com> References: <9194A025-FBF1-4CBB-8BA6-280063F57886@me.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-ID: <5bb2865f-371c-48bd-8fd7-bad046bd9119> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org When you write on a machine running a datanode process, the data is *always= * written locally first. This is to provide an optimization to the MapRedu= ce framework. The lesson here is that you should *never* use a datanode m= achine to load your data. Always do it outside the grid. Additionally, you can use fsck (filename) -files -locations -blocks to see = where those blocks have been written. =20 On Jul 13, 2010, at 9:45 AM, Nathan Grice wrote: > To test the block distribution, run the same put command from the NameNod= e > and then again from the DataNode. > Check the HDFS filesystem after both commands. In my case, a 2GB file was > distributed mostly evenly across the datanodes when put was run on the > NameNode, and then put only on the DataNode where I ran the put command >=20 > On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar = wrote: >=20 >> Hi, >> I am a newbie. I am curious to know how you discovered that all the bloc= ks >> are written to datanode's hdfs? I thought the replication by namenode wa= s >> transparent. Am I missing something? >> Thanks, >> Krishna >> On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote: >>=20 >>> We are trying to load data into hdfs from one of the slaves and when th= e >> put >>> command is run from a slave(datanode) all of the blocks are written to >> the >>> datanode's hdfs, and not distributed to all of the nodes in the cluster= . >> It >>> does not seem to matter what destination format we use ( /filename vs >>> hdfs://master:9000/filename) it always behaves the same. >>> Conversely, running the same command from the namenode distributes the >> files >>> across the datanodes. >>>=20 >>> Is there something I am missing? >>>=20 >>> -Nathan >>=20 >>=20