hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Rottinghuis" <jrottingh...@gmail.com>
Subject Re: Question about writing HDFS files
Date Fri, 17 May 2013 15:24:18 GMT
Yes.

Joep


On Fri, May 17, 2013 at 6:38 AM, John Lilley <john.lilley@redpoint.net>wrote:

> Right, sorry for the ambiguity, I was talking about HDFS writes only.
>
> So my application doesn't need to do anything to signal that it is writing
> from inside vs. outside of the Hadoop cluster, it figures that out from IP
> or hostname?
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, May 16, 2013 11:12 PM
> To: <user@hadoop.apache.org>
> Subject: Re: Question about writing HDFS files
>
> Thanks for the clarification Rahul. In that case, then the reading is
> correct (and that a HDFS client behaves the same, in and out of MR - its
> not really related to MR at all).
>
> A "client outside" would write to a random set of datanode, across at
> least two racks for 3 replicas if rack awareness is turned on.
>
> On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
> > Hi Harsh,
> >
> > I think what John meant by writing to local disk is writing to the
> > same data node first which has initiated the write call.
> >
> > John can further clarify.
> >
> >
> > On Fri, May 17, 2013 at 4:23 AM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> That is not true. HDFS writes are not staged to a local disk first
> >> before being written onto the DataNodes. The old architecture docs
> >> seem to suggest that the writes get staged to a local disk but thats
> >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
> >>
> >> Also worth noting that a HDFS client behaves the same way in almost
> >> all contexts, whether its invoked from an MR framework or directly
> >> from shell.
> >>
> >> On Fri, May 17, 2013 at 3:38 AM, John Lilley
> >> <john.lilley@redpoint.net>
> >> wrote:
> >> > I seem to recall reading that when a MapReduce task writes a file,
> >> > the blocks of the file are always written to local disk, and
> >> > replicated to other nodes.  If this is true, is this also true for
> >> > non-MR applications writing to HDFS from Hadoop worker nodes?  What
> >> > about clients outside of the cluster doing a file load?
> >> >
> >> > Thanks
> >> >
> >> > John
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message