hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mapred Learn <mapred.le...@gmail.com>
Subject Re: Are hadoop fs commands serial or parallel
Date Wed, 18 May 2011 17:05:24 GMT
Thanks Patrick !
This would work if directory is to be uploaded but for streaming, I guess, this would not
work.

Sent from my iPhone

On May 18, 2011, at 9:39 AM, Patrick Angeles <patrick@cloudera.com> wrote:

> kinda clunky but you could do this via shell:
> 
> for $FILE in $LIST_OF_FILES ; do
>  hadoop fs -copyFromLocal $FILE $DEST_PATH &
> done
> 
> If doing this via the Java API, then, yes you will have to use multiple
> threads.
> 
> On Wed, May 18, 2011 at 1:04 AM, Mapred Learn <mapred.learn@gmail.com>wrote:
> 
>> Thanks harsh !
>> That means basically both APIs as well as hadoop client commands allow only
>> serial writes.
>> I was wondering what could be other ways to write data in parallel to HDFS
>> other than using multiple parallel threads.
>> 
>> Thanks,
>> JJ
>> 
>> Sent from my iPhone
>> 
>> On May 17, 2011, at 10:59 PM, Harsh J <harsh@cloudera.com> wrote:
>> 
>>> Hello,
>>> 
>>> Adding to Joey's response, copyFromLocal's current implementation is
>> serial
>>> given a list of files.
>>> 
>>> On Wed, May 18, 2011 at 9:57 AM, Mapred Learn <mapred.learn@gmail.com>
>>> wrote:
>>>> Thanks Joey !
>>>> I will try to find out abt copyFromLocal. Looks like Hadoop Apis write
>>> serially as you pointed out.
>>>> 
>>>> Thanks,
>>>> -JJ
>>>> 
>>>> On May 17, 2011, at 8:32 PM, Joey Echeverria <joey@cloudera.com> wrote:
>>>> 
>>>>> The sequence file writer definitely does it serially as you can only
>>>>> ever write to the end of a file in Hadoop.
>>>>> 
>>>>> Doing copyFromLocal could write multiple files in parallel (I'm not
>>>>> sure if it does or not), but a single file would be written serially.
>>>>> 
>>>>> -Joey
>>>>> 
>>>>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn <mapred.learn@gmail.com>
>>> wrote:
>>>>>> Hi,
>>>>>> My question is when I run a command from hdfs client, for eg. hadoop
>> fs
>>>>>> -copyFromLocal or create a sequence file writer in java code and
>> append
>>>>>> key/values to it through Hadoop APIs, does it internally
>> transfer/write
>>> data
>>>>>> to HDFS serially or in parallel ?
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> -JJ
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Joseph Echeverria
>>>>> Cloudera, Inc.
>>>>> 443.305.9434
>>>> 
>>> 
>>> --
>>> Harsh J
>> 

Mime
View raw message