hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Herberts <mathias.herbe...@gmail.com>
Subject Re: Copy Vs DistCP
Date Sun, 14 Apr 2013 17:33:03 GMT
> This is absolutely true.  Distcp dominates cp for large copies.  On the
other hand cp dominates distcp for convenience.
> In my own experience, I love cp when copying relatively small amounts of
data (10's of GB) where the available bandwidth of about a GB/s allows the
copy to complete in less time that it takes distcp to get started.
> At larger sizes (100's of GB and up), the startup time of distcp doesn't
matter because once it gets going, it moves data much faster.

Maybe we could put together a 'fs -smartcp' which choses wisely between
copy and distcp depending on file size

View raw message