hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: DistCP : Is it gauranteed to work for any two uri schemes?
Date Tue, 04 Feb 2014 14:13:33 GMT
Just as Harsh pointed out, as long as the underline DFS provides all the required API of DFS
for Hadoop, DistCP should work. One thing is that all the required library (including any
conf files) needs to be in the classpath, if they are not available in the runtime cluster.
Same as S3 file system works fine in DistCP, our project copied TBs data between CFS (Cassandra
DFS) and HDFS, when we migrated the data from DFS to HDFS, using Distcp.
Yong

> From: harsh@cloudera.com
> Date: Tue, 4 Feb 2014 19:03:15 +0530
> Subject: Re: DistCP : Is it gauranteed to work for any two uri schemes?
> To: user@hadoop.apache.org
> 
> Overall the whole DistCp utility is devoid of any HDFS specific items,
> but does have some (mostly skippable) checks pertaining to FS level
> features such as permissions, checksums, etc.. It should and does work
> with any valid URI scheme that the libraries understand to be valid
> FSes today.
> 
> On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <jayunit100@gmail.com> wrote:
> > Hi folks:
> >
> > I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> > built to work between any two hadoop file system classes ?
> >
> > Or is it implicitly built mainly to work to copy between to HDFS file
> > sytems.
> >
> > I've havent found many examples online with different URI schemes.
> >
> > With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> > between different filesystems using distcp.
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> 
> 
> 
> -- 
> Harsh J
 		 	   		  
Mime
View raw message