hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shekhar Sharma <shekhar2...@gmail.com>
Subject Re: HDFS data transfer is faster than SCP based transfer?
Date Sat, 25 Jan 2014 10:17:13 GMT
WHEN u put the data or write into HDFS, 64kb of data is written on client
side and then it is pushed through pipeline and this process continue till
64mb of data is written which is the block size defined by the client.

While on the other hand scp will try to buffer the entire data. Passing
chunks of data would be faster than passing larger data.

Please check how writing happen in HDFS. That will give you clear picture
On 24 Jan 2014 10:56, "rab ra" <rabmdu@gmail.com> wrote:

> Hello
> I have a use case that requires transfer of input files from remote
> storage using SCP protocol (using jSCH jar).  To optimize this use case, I
> have pre-loaded all my input files into HDFS and modified my use case so
> that it copies required files from HDFS. So, when tasktrackers works, it
> copies required number of input files to its local directory from HDFS. All
> my tasktrackers are also datanodes. I could see my use case has run faster.
> The only modification in my application is that file copy from HDFS instead
> of transfer using SCP. Also, my use case involves parallel operations (run
> in tasktrackers) and they do lot of file transfer. Now all these transfers
> are replaced with HDFS copy.
> Can anyone tell me HDFS transfer is faster as I witnessed? Is it because,
> it uses TCP/IP? Can anyone give me reasonable reasons to support the
> decrease of time?
> with thanks and regards
> rab

View raw message