hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Copy Vs DistCP
Date Sun, 14 Apr 2013 18:01:18 GMT
On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> >
> > This is absolutely true.  Distcp dominates cp for large copies.  On the
> other hand cp dominates distcp for convenience.
> >
> > In my own experience, I love cp when copying relatively small amounts of
> data (10's of GB) where the available bandwidth of about a GB/s allows the
> copy to complete in less time that it takes distcp to get started.
> >
> > At larger sizes (100's of GB and up), the startup time of distcp doesn't
> matter because once it gets going, it moves data much faster.
> Maybe we could put together a 'fs -smartcp' which choses wisely between
> copy and distcp depending on file size

Uh... hmm...

This is a good suggestion.  Obvious in fact.  In retrospect.

I would also suggest that the new command be called "distcp".

View raw message