hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: HDFS data transfer is faster than SCP based transfer?
Date Sat, 25 Jan 2014 10:39:11 GMT
There's a lot of difference here, although both do use TCP underneath,
but do note that SCP securely encrypts data but stock HDFS
configuration does not.

You can also ask SCP to compress data transfer via the "-C" argument
btw - unsure if you already applied that pre-test - it may help show
up some difference. Also, the encryption algorithm can be changed to a
weaker one if security is not a concern during the transfer, via "-c
arcfour".

On Fri, Jan 24, 2014 at 10:55 AM, rab ra <rabmdu@gmail.com> wrote:
> Hello
>
> I have a use case that requires transfer of input files from remote storage
> using SCP protocol (using jSCH jar).  To optimize this use case, I have
> pre-loaded all my input files into HDFS and modified my use case so that it
> copies required files from HDFS. So, when tasktrackers works, it copies
> required number of input files to its local directory from HDFS. All my
> tasktrackers are also datanodes. I could see my use case has run faster. The
> only modification in my application is that file copy from HDFS instead of
> transfer using SCP. Also, my use case involves parallel operations (run in
> tasktrackers) and they do lot of file transfer. Now all these transfers are
> replaced with HDFS copy.
>
> Can anyone tell me HDFS transfer is faster as I witnessed? Is it because, it
> uses TCP/IP? Can anyone give me reasonable reasons to support the decrease
> of time?
>
>
> with thanks and regards
> rab



-- 
Harsh J

Mime
View raw message