hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: copytolocal vs distcp
Date Sat, 09 Mar 2013 19:00:52 GMT
Try file:///fs4/outdir

Symbolic links can also help.

Note that this file system has to be visible with the same path on all
hosts.  You may also be bandwidth limited by whatever is serving that file

There are cases where you won't be limited by the file system.  MapR, for
instance, has a completely distributed NFS server and specialized file
systems like lustre might also have distributed network traffic. If you are
just writing to a conventional NAS, however, this is unlikely to win much
relative to copytolocal simply due to bottlenecking.

On Sat, Mar 9, 2013 at 1:07 PM, John Meza <j_mezazap@hotmail.com> wrote:

> I need suggestions on best methods of copying  alot of data (~6Tb) from a
> cluster (20-dn) to the local file system.
> While *distcp *has much more throughput compared to copytolocal (I think)
> because it uses MR jobs,  it doesn't seem to work well with the following
> syntax
>    <desturl> =   "file://fs4/outdir/"
> Problem: It puts in the home dir for the linux user. To get this to work I
> need to redefine the users home dir to the output dir (lun) with lotsa disk
> space.?
> *copytolocal *is straightforward to use, but lacks the throughput (I
> think).
> Suggestions? Advice?
> thanks
> John

View raw message