hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Question about writing HDFS files
Date Fri, 17 May 2013 05:12:00 GMT
Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<rahul.rec.dgp@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <john.lilley@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes.  If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes?  What about clients outside of the
>> > cluster
>> > doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Mime
View raw message