hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter Plaetinck <dieter.plaeti...@intec.ugent.be>
Subject Re: Are hadoop fs commands serial or parallel
Date Fri, 20 May 2011 11:10:47 GMT
What do you mean clunky?
IMHO this is a quite elegant, simple, working solution.
Sure this spawns multiple processes, but it beats any
api-overcomplications, imho.

Dieter


On Wed, 18 May 2011 11:39:36 -0500
Patrick Angeles <patrick@cloudera.com> wrote:

> kinda clunky but you could do this via shell:
> 
> for $FILE in $LIST_OF_FILES ; do
>   hadoop fs -copyFromLocal $FILE $DEST_PATH &
> done
> 
> If doing this via the Java API, then, yes you will have to use
> multiple threads.
> 
> On Wed, May 18, 2011 at 1:04 AM, Mapred Learn
> <mapred.learn@gmail.com>wrote:
> 
> > Thanks harsh !
> > That means basically both APIs as well as hadoop client commands
> > allow only serial writes.
> > I was wondering what could be other ways to write data in parallel
> > to HDFS other than using multiple parallel threads.
> >
> > Thanks,
> > JJ
> >
> > Sent from my iPhone
> >
> > On May 17, 2011, at 10:59 PM, Harsh J <harsh@cloudera.com> wrote:
> >
> > > Hello,
> > >
> > > Adding to Joey's response, copyFromLocal's current implementation
> > > is
> > serial
> > > given a list of files.
> > >
> > > On Wed, May 18, 2011 at 9:57 AM, Mapred Learn
> > > <mapred.learn@gmail.com> wrote:
> > >> Thanks Joey !
> > >> I will try to find out abt copyFromLocal. Looks like Hadoop Apis
> > >> write
> > > serially as you pointed out.
> > >>
> > >> Thanks,
> > >> -JJ
> > >>
> > >> On May 17, 2011, at 8:32 PM, Joey Echeverria <joey@cloudera.com>
> > >> wrote:
> > >>
> > >>> The sequence file writer definitely does it serially as you can
> > >>> only ever write to the end of a file in Hadoop.
> > >>>
> > >>> Doing copyFromLocal could write multiple files in parallel (I'm
> > >>> not sure if it does or not), but a single file would be written
> > >>> serially.
> > >>>
> > >>> -Joey
> > >>>
> > >>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn
> > >>> <mapred.learn@gmail.com>
> > > wrote:
> > >>>> Hi,
> > >>>> My question is when I run a command from hdfs client, for eg.
> > >>>> hadoop
> > fs
> > >>>> -copyFromLocal or create a sequence file writer in java code
> > >>>> and
> > append
> > >>>> key/values to it through Hadoop APIs, does it internally
> > transfer/write
> > > data
> > >>>> to HDFS serially or in parallel ?
> > >>>>
> > >>>> Thanks in advance,
> > >>>> -JJ
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Joseph Echeverria
> > >>> Cloudera, Inc.
> > >>> 443.305.9434
> > >>
> > >
> > > --
> > > Harsh J
> >


Mime
View raw message