hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mapred Learn <mapred.le...@gmail.com>
Subject Re: Are hadoop fs commands serial or parallel
Date Fri, 27 May 2011 05:07:19 GMT
Hi guys,
Another question related to it is that when you do hadoop fs -copyFromLocal
or use
API to call fs.write(), does it write to local filesystem first before
writing to HDFS. I read and found out that it writes on local file-system
until block-size is reached and then writes on HDFS.
Wouldn't HDFS Client choke if it writes to local filesystem if multiple such
fs -copyFromLocal commands are running. I thought atleast in fs.write(), if
you provide byte array, it should not write on local file-system ?

Could somebody tell how fs -copyFromLocal and fs.write() work ? Do they
write on local-filesystem beofre block size is reached and then write to
HDFS or write directly to HDFS ?

Thanks in advance,
-JJ

On Wed, May 18, 2011 at 9:39 AM, Patrick Angeles <patrick@cloudera.com>wrote:

> kinda clunky but you could do this via shell:
>
> for $FILE in $LIST_OF_FILES ; do
>  hadoop fs -copyFromLocal $FILE $DEST_PATH &
> done
>
> If doing this via the Java API, then, yes you will have to use multiple
> threads.
>
> On Wed, May 18, 2011 at 1:04 AM, Mapred Learn <mapred.learn@gmail.com
> >wrote:
>
> > Thanks harsh !
> > That means basically both APIs as well as hadoop client commands allow
> only
> > serial writes.
> > I was wondering what could be other ways to write data in parallel to
> HDFS
> > other than using multiple parallel threads.
> >
> > Thanks,
> > JJ
> >
> > Sent from my iPhone
> >
> > On May 17, 2011, at 10:59 PM, Harsh J <harsh@cloudera.com> wrote:
> >
> > > Hello,
> > >
> > > Adding to Joey's response, copyFromLocal's current implementation is
> > serial
> > > given a list of files.
> > >
> > > On Wed, May 18, 2011 at 9:57 AM, Mapred Learn <mapred.learn@gmail.com>
> > > wrote:
> > >> Thanks Joey !
> > >> I will try to find out abt copyFromLocal. Looks like Hadoop Apis write
> > > serially as you pointed out.
> > >>
> > >> Thanks,
> > >> -JJ
> > >>
> > >> On May 17, 2011, at 8:32 PM, Joey Echeverria <joey@cloudera.com>
> wrote:
> > >>
> > >>> The sequence file writer definitely does it serially as you can only
> > >>> ever write to the end of a file in Hadoop.
> > >>>
> > >>> Doing copyFromLocal could write multiple files in parallel (I'm not
> > >>> sure if it does or not), but a single file would be written serially.
> > >>>
> > >>> -Joey
> > >>>
> > >>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn <
> mapred.learn@gmail.com>
> > > wrote:
> > >>>> Hi,
> > >>>> My question is when I run a command from hdfs client, for eg. hadoop
> > fs
> > >>>> -copyFromLocal or create a sequence file writer in java code and
> > append
> > >>>> key/values to it through Hadoop APIs, does it internally
> > transfer/write
> > > data
> > >>>> to HDFS serially or in parallel ?
> > >>>>
> > >>>> Thanks in advance,
> > >>>> -JJ
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Joseph Echeverria
> > >>> Cloudera, Inc.
> > >>> 443.305.9434
> > >>
> > >
> > > --
> > > Harsh J
> >
>

Mime
View raw message