hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <hadoop.supp...@visolve.com>
Subject RE: Copy data between clusters during the job execution.
Date Tue, 03 Feb 2015 05:23:19 GMT
It seems in your first error message, you have missed the source directory argument by a bit.
One common usage of distcp is :

 

Distcp (solution to your problem)

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

It is also wise to use latest tool: 

distcp2

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node. 

 

Optional:

If you need, you can provide multiple directories using “\”  option the command

 

Thanks and Regards, 
S.RagavendraGanesh 
Hadoop Support Team 
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com

 

 

From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.

 

take a look at oozie, once first job completes you can distcp to another server.

Artem Ervits

On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <mailto:danielrulez@gmail.com>
> wrote:

It should run after your job finishes.

You can create the flow using a simple bash script

Daniel


On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <mailto:xeonmailinglist@gmail.com>
> wrote:

But can I use discp inside my job, or I need to program something that executes distcp after
executing my job?



On 02-02-2015 10:20, Daniel Haviv wrote:

an use distcp

 

Daniel

 

On 2 בפבר׳ 2015, at 11:12, 

 


Mime
View raw message