hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Solving the "hang" problem in dfs -copyToLocal/-cat...
Date Wed, 27 Feb 2008 23:43:28 GMT

It is read-only.

I started a fix to add posting, but didn't finish it.


On 2/27/08 2:59 PM, "C G" <parallelguy@yahoo.com> wrote:

> I think HTTP access is read-only...you'll need to continue to use
> copyFromLocalFile
>    
>   C G
>   
> 
> Phillip Wu <pwu@helio.com> wrote:
>   Very helpful information.
> 
> Is there any ways to put files into DFS remotely, like http post?
> Or I have to keep using copyFromLocalFile?
> 
> 
> Thanks,
> 
> Phil
> 
> mobile . 626.234.7515 . yim . heliophillip
> www.helio.com
> -----Original Message-----
> From: C G [mailto:parallelguy@yahoo.com]
> Sent: Wednesday, February 27, 2008 2:46 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Solving the "hang" problem in dfs -copyToLocal/-cat...
> 
> I haven't looked at the source code to see how -cat is implemented, but
> I was pretty surprised at the results as well. When I sat down to do
> this experiment I figured I was wasting my time..surprisingly I was not.
> 
> C G
> 
> Joydeep Sen Sarma wrote:
> This is amazing ..
> 
> Wouldn't dfs -cat use the same dfs client codepath that an actual
> map-reduce program would? (If so, should it also start using http client
> instead? (at least for the non-local case))
> 
> Or maybe it already does?
> 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunning@veoh.com]
> Sent: Wednesday, February 27, 2008 12:10 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Solving the "hang" problem in dfs -copyToLocal/-cat...
> 
> 
> Have you tried using http to fetch the file instead?
> 
> http:///data/
> 
> This will get redirected to one of the datanodes to handle and should be
> pretty fast. It would be interesting to find out if this alternative
> path
> is subject to the same hangs that you are seeing.
> 
> 
> On 2/27/08 12:05 PM, "C G"
> wrote:
> 
>> Hi All:
>> 
>> The following write-up is offered to help out anybody else who has
> seen
>> performance problems and "hangs" while using dfs -copyToLocal/-cat.
>> 
>> One of the performance problems that has been causing big problems
> for us
>> has been using the dfs commands -copyToLocal and -cat to move data
> from HDFS
>> to a local file system. We do this in order to populate a data
> warehouse that
>> is HDFS-unaware.
>> 
>> The "pattern" I've been using is:
>> 
>> rm -f loadfile.dat
>> fileList=`bin/hadoop dfs -ls /foo | grep part | awk '{print $1}'`
>> for x in `echo ${fileList}`
>> do
>> bin/hadoop dfs -cat ${x} >> loadfile.dat
>> done
>> 
>> This pattern repeats several times, ultimately cat-ing 353 files
> into
>> several load files. This process is extremely slow, often taking
> 20-30
>> minutes to transfer 142M of data. More frustrating is that the system
> simply
>> "pauses" during cat operations. There is no I/O activity, no CPU
> activity,
>> nothing written to the log files on any node. Things just stop. I
> changed
>> the pattern to use -copyToLocal instead of -cat and had the same
> results. We
>> observe this "pause" behavior without respect for where the
> -copyToLocal or
>> -cat originates - I've tried running directly on the grid, and also
> directly
>> on the DB server which is not part of the grid proper. I've tried
> many
>> different releases of Hadoop, including 0.16.0, and all exhibit this
> problem.
>> 
>> I decided to try a different approach and use the HTTP interface to
> the
>> namenode to transfer the data:
>> 
>> rm -f loadfile.dat
>> fileList=`bin/hadoop dfs -ls /foo | grep part | awk '{print $1}'`
>> for x in `echo ${fileList}`
>> do
>> wget -q http://mynamenodeserver:50070/data${x}
>> done
>> 
>> There is a trivial step to merge the individual part files into one
> file
>> preparatory for loading data.
>> 
>> I ran this experiment across 10,850 files containing an aggregate
> total of
>> 4.6G of data. It ran in under 2 hours, which while not great is
> significantly
>> better than the 18 hours it previously took -copyToLocal/-cat to run.
>> 
>> I found it surprising that this solution works better than
>> -copyToLocal/-cat.
>> 
>> Hope this helps...
>> C G
>> 
>> 
>> 
>> ---------------------------------
>> Looking for last minute shopping deals? Find them fast with Yahoo!
> Search.
> 
> 
> 
> 
> ---------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo!
> Search.
> 
> 
>        
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it
> now.


Mime
View raw message