hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Meza <j_meza...@hotmail.com>
Subject RE: copytolocal vs distcp
Date Sat, 09 Mar 2013 19:17:27 GMT
The file:///fs4/outdir solved the outfile location issue. Dhaval Shah made the same suggestion.
That's good.But getting Map exceptions now. Given your comment about conventional NAS this
all may be for naught. Let me describe my -planned- workflow:-export data from hdfs to local-dir
(which is a directory on a lun off my Netapp filer)-copy to portable disk array, send to cloud
provider-import to hdfs
Q:all Maps output to local dirs on each datanode?Q:20 dns writing to same lun will have multiple
issues:  -possible directory naming collisions?  -bottleneck at controller on filer? I think
yes.Q:i should just start using copytolocal now, hopefully it will complete by Monday am.
From: tdunning@maprtech.com
Date: Sat, 9 Mar 2013 14:00:52 -0500
Subject: Re: copytolocal vs distcp
To: user@hadoop.apache.org

Try file:///fs4/outdir
Symbolic links can also help.
Note that this file system has to be visible with the same path on all hosts.  You may also
be bandwidth limited by whatever is serving that file system.

There are cases where you won't be limited by the file system.  MapR, for instance, has a
completely distributed NFS server and specialized file systems like lustre might also have
distributed network traffic. If you are just writing to a conventional NAS, however, this
is unlikely to win much relative to copytolocal simply due to bottlenecking.

On Sat, Mar 9, 2013 at 1:07 PM, John Meza <j_mezazap@hotmail.com> wrote:

I need suggestions on best methods of copying  alot of data (~6Tb) from a cluster (20-dn)
to the local file system. 

While distcp has much more throughput compared to copytolocal (I think) because it uses MR
jobs,  it doesn't seem to work well with the following syntax   <desturl> =   "file://fs4/outdir/"

Problem: It puts in the home dir for the linux user. To get this to work I need to redefine
the users home dir to the output dir (lun) with lotsa disk space.?

copytolocal is straightforward to use, but lacks the throughput (I think).

Suggestions? Advice?thanksJohn 		 	   		  

View raw message