hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Question about writing HDFS files
Date Fri, 17 May 2013 13:38:52 GMT
Right, sorry for the ambiguity, I was talking about HDFS writes only.

So my application doesn't need to do anything to signal that it is writing from inside vs.
outside of the Hadoop cluster, it figures that out from IP or hostname?


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, May 16, 2013 11:12 PM
To: <user@hadoop.apache.org>
Subject: Re: Question about writing HDFS files

Thanks for the clarification Rahul. In that case, then the reading is correct (and that a
HDFS client behaves the same, in and out of MR - its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at least two racks for
3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the 
> same data node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first 
>> before being written onto the DataNodes. The old architecture docs 
>> seem to suggest that the writes get staged to a local disk but thats 
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost 
>> all contexts, whether its invoked from an MR framework or directly 
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley 
>> <john.lilley@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, 
>> > the blocks of the file are always written to local disk, and 
>> > replicated to other nodes.  If this is true, is this also true for 
>> > non-MR applications writing to HDFS from Hadoop worker nodes?  What 
>> > about clients outside of the cluster doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Mime
View raw message