hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiang licht <licht_ji...@yahoo.com>
Subject Re: Copying files between two remote hadoop clusters
Date Fri, 05 Mar 2010 22:37:26 GMT
This is sth that I asked recently :)

Here's a list of what I can think of

1. on remote box of data, cat filetobesent | ssh hadoopmaster 'hadoop fs -put - dstinhdfs'

2. on remote box of data, configure core-site.xml to set fs.default.name to hdfs://namenode:port
and then fire a "hadoop fs -copyFromLocal" or "hadoop fs -put" as it is if your namenode is
accessible from your data box or through a VPN to reach the namenode.

3. hdfs-aware gridftp, you can read more detail about it here sth that was mentioned inĀ 
Brian Bockelman's reply:


4. you can write a data transfer tool that is HDFS-aware and will run on data box: reads data
on data box and send it over network to its partner on namenode and writes directly into hadoop

5. other idea?



--- On Fri, 3/5/10, zenMonkey <numan.salati@gmail.com> wrote:

From: zenMonkey <numan.salati@gmail.com>
Subject: Copying files between two remote hadoop clusters
To: hadoop-user@lucene.apache.org
Date: Friday, March 5, 2010, 4:25 PM

I want to write a script that pulls data (flat files) from a remote machine
and pushes that into its hadoop cluster.

At the moment, it is done in two steps:

1 - Secure copy the remote files
2 - Put the files into HDFS

I was wondering if it was possible to optimize this by avoiding copying to
local fs before pushing to hdfs; and instead write directly to hdfs. I am
not sure if this is something that hadoop tools already provide. 

Thanks for any help.

View this message in context: http://old.nabble.com/Copying-files-between-two-remote-hadoop-clusters-tp27799963p27799963.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message