Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B10BDD0F3 for ; Fri, 17 May 2013 13:39:27 +0000 (UTC) Received: (qmail 45143 invoked by uid 500); 17 May 2013 13:39:23 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 44599 invoked by uid 500); 17 May 2013 13:39:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 44574 invoked by uid 99); 17 May 2013 13:39:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 13:39:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net designates 206.225.164.222 as permitted sender) Received: from [206.225.164.222] (HELO hub021-nj-6.exch021.serverdata.net) (206.225.164.222) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 13:39:13 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-6.exch021.domain.local ([10.240.4.92]) with mapi id 14.02.0318.001; Fri, 17 May 2013 06:38:53 -0700 From: John Lilley To: "user@hadoop.apache.org" Subject: RE: Question about writing HDFS files Thread-Topic: Question about writing HDFS files Thread-Index: Ac5SgZbsIATt5SulRZyOI6oiODBG8AAQT1sAAAguNQAABQuiAAAC++DA Date: Fri, 17 May 2013 13:38:52 +0000 Message-ID: <869970D71E26D7498BDAC4E1CA92226B65898D3D@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B658987A3@MBX021-E3-NJ-2.exch021.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [206.168.224.109] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Right, sorry for the ambiguity, I was talking about HDFS writes only. So my application doesn't need to do anything to signal that it is writing = from inside vs. outside of the Hadoop cluster, it figures that out from IP = or hostname? -----Original Message----- From: Harsh J [mailto:harsh@cloudera.com]=20 Sent: Thursday, May 16, 2013 11:12 PM To: Subject: Re: Question about writing HDFS files Thanks for the clarification Rahul. In that case, then the reading is corre= ct (and that a HDFS client behaves the same, in and out of MR - its not rea= lly related to MR at all). A "client outside" would write to a random set of datanode, across at least= two racks for 3 replicas if rack awareness is turned on. On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee wrote: > Hi Harsh, > > I think what John meant by writing to local disk is writing to the=20 > same data node first which has initiated the write call. > > John can further clarify. > > > On Fri, May 17, 2013 at 4:23 AM, Harsh J wrote: >> >> That is not true. HDFS writes are not staged to a local disk first=20 >> before being written onto the DataNodes. The old architecture docs=20 >> seem to suggest that the writes get staged to a local disk but thats=20 >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454. >> >> Also worth noting that a HDFS client behaves the same way in almost=20 >> all contexts, whether its invoked from an MR framework or directly=20 >> from shell. >> >> On Fri, May 17, 2013 at 3:38 AM, John Lilley=20 >> >> wrote: >> > I seem to recall reading that when a MapReduce task writes a file,=20 >> > the blocks of the file are always written to local disk, and=20 >> > replicated to other nodes. If this is true, is this also true for=20 >> > non-MR applications writing to HDFS from Hadoop worker nodes? What=20 >> > about clients outside of the cluster doing a file load? >> > >> > Thanks >> > >> > John >> > >> > >> >> >> >> -- >> Harsh J > > -- Harsh J