Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1420B9F0C for ; Fri, 17 May 2013 15:24:48 +0000 (UTC) Received: (qmail 38624 invoked by uid 500); 17 May 2013 15:24:43 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 38537 invoked by uid 500); 17 May 2013 15:24:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 38528 invoked by uid 99); 17 May 2013 15:24:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 15:24:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jrottinghuis@gmail.com designates 209.85.160.50 as permitted sender) Received: from [209.85.160.50] (HELO mail-pb0-f50.google.com) (209.85.160.50) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 15:24:39 +0000 Received: by mail-pb0-f50.google.com with SMTP id wy17so454017pbc.37 for ; Fri, 17 May 2013 08:24:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=wvulC59O4n2ebJSOb/1DBM4IR+dnNxN44GRm24SkXHA=; b=lvKEVPLSWaN3T0QVZ+x8pxrP1avxbBZHcu4CjLqaxxrhmD5g4sP6c18J59CLGU6rEf V0XPFYdkFEMZwZgHhyhPtxbr/P9O8cW9rLcIS0uRH+trr53I+CQbc/yXL6eMbLXV8huU TSDOY2eDacOlvDwSRIAoG2+4I/TUBHZGx/KoBLJRJI1JxtGeOUM0+PAVswIthUOIqO5i jWTr+aay5jsw4rIociO1/fdCLxzJIT/tK+cc9NM8GXZctJyVUvPY7yx3nL5SpjPoWXq9 hF8KfXIby6yQskU8V51WYtWvNltG7XNsAoqP/ScHsHwPawP8zyEebNroJ0cQ10qbNXI+ ZK6g== MIME-Version: 1.0 X-Received: by 10.68.163.132 with SMTP id yi4mr48897629pbb.64.1368804258827; Fri, 17 May 2013 08:24:18 -0700 (PDT) Received: by 10.68.51.170 with HTTP; Fri, 17 May 2013 08:24:18 -0700 (PDT) In-Reply-To: <869970D71E26D7498BDAC4E1CA92226B65898D3D@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B658987A3@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B65898D3D@MBX021-E3-NJ-2.exch021.domain.local> Date: Fri, 17 May 2013 08:24:18 -0700 Message-ID: Subject: Re: Question about writing HDFS files From: "J. Rottinghuis" To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7bacb3d49d530b04dceb93f7 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bacb3d49d530b04dceb93f7 Content-Type: text/plain; charset=ISO-8859-1 Yes. Joep On Fri, May 17, 2013 at 6:38 AM, John Lilley wrote: > Right, sorry for the ambiguity, I was talking about HDFS writes only. > > So my application doesn't need to do anything to signal that it is writing > from inside vs. outside of the Hadoop cluster, it figures that out from IP > or hostname? > > > -----Original Message----- > From: Harsh J [mailto:harsh@cloudera.com] > Sent: Thursday, May 16, 2013 11:12 PM > To: > Subject: Re: Question about writing HDFS files > > Thanks for the clarification Rahul. In that case, then the reading is > correct (and that a HDFS client behaves the same, in and out of MR - its > not really related to MR at all). > > A "client outside" would write to a random set of datanode, across at > least two racks for 3 replicas if rack awareness is turned on. > > On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee < > rahul.rec.dgp@gmail.com> wrote: > > Hi Harsh, > > > > I think what John meant by writing to local disk is writing to the > > same data node first which has initiated the write call. > > > > John can further clarify. > > > > > > On Fri, May 17, 2013 at 4:23 AM, Harsh J wrote: > >> > >> That is not true. HDFS writes are not staged to a local disk first > >> before being written onto the DataNodes. The old architecture docs > >> seem to suggest that the writes get staged to a local disk but thats > >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454. > >> > >> Also worth noting that a HDFS client behaves the same way in almost > >> all contexts, whether its invoked from an MR framework or directly > >> from shell. > >> > >> On Fri, May 17, 2013 at 3:38 AM, John Lilley > >> > >> wrote: > >> > I seem to recall reading that when a MapReduce task writes a file, > >> > the blocks of the file are always written to local disk, and > >> > replicated to other nodes. If this is true, is this also true for > >> > non-MR applications writing to HDFS from Hadoop worker nodes? What > >> > about clients outside of the cluster doing a file load? > >> > > >> > Thanks > >> > > >> > John > >> > > >> > > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J > --047d7bacb3d49d530b04dceb93f7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Yes.

Joep


On Fri, May 17, 2013 at 6:38 AM, J= ohn Lilley <john.lilley@redpoint.net> wrote:
Right, sorry for the ambiguity, I was talkin= g about HDFS writes only.

So my application doesn't need to do anything to signal that it is writ= ing from inside vs. outside of the Hadoop cluster, it figures that out from= IP or hostname?


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.= com]
Sent: Thursday, May 16, 2013 11:12 PM
To: <user@hadoop.apache.org>
Subject: Re: Question about writing HDFS files

Thanks for the clarification Rahul. In that case, then the reading is corre= ct (and that a HDFS client behaves the same, in and out of MR - its not rea= lly related to MR at all).

A "client outside" would write to a random set of datanode, acros= s at least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the
> same data node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first=
>> before being written onto the DataNodes. The old architecture docs=
>> seem to suggest that the writes get staged to a local disk but tha= ts
>> not true anymore, see https://issues.apache.org/jira/browse/HDF= S-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almos= t
>> all contexts, whether its invoked from an MR framework or directly=
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley
>> <john.lilley@redpoi= nt.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a = file,
>> > the blocks of the file are always written to local disk, and<= br> >> > replicated to other nodes. =A0If this is true, is this also t= rue for
>> > non-MR applications writing to HDFS from Hadoop worker nodes?= =A0What
>> > about clients outside of the cluster doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

--047d7bacb3d49d530b04dceb93f7--