hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: HDFS data transfer is faster than SCP based transfer?
Date Sat, 25 Jan 2014 14:16:33 GMT
There are no short-circuit writes, only reads, AFAIK.
Is it necessary to transfer from HDFS to local disk?  Can you read from HDFS directly using
the FileSystem interface?

From: Shekhar Sharma [mailto:shekhar2581@gmail.com]
Sent: Saturday, January 25, 2014 3:44 AM
To: user@hadoop.apache.org
Subject: Re: HDFS data transfer is faster than SCP based transfer?

We have the concept of short circuit reads which directly reads from data node which improve
read performance. Do we have similar concept like short circuit writes
On 25 Jan 2014 16:10, "Harsh J" <harsh@cloudera.com<mailto:harsh@cloudera.com>>
There's a lot of difference here, although both do use TCP underneath,
but do note that SCP securely encrypts data but stock HDFS
configuration does not.

You can also ask SCP to compress data transfer via the "-C" argument
btw - unsure if you already applied that pre-test - it may help show
up some difference. Also, the encryption algorithm can be changed to a
weaker one if security is not a concern during the transfer, via "-c

On Fri, Jan 24, 2014 at 10:55 AM, rab ra <rabmdu@gmail.com<mailto:rabmdu@gmail.com>>
> Hello
> I have a use case that requires transfer of input files from remote storage
> using SCP protocol (using jSCH jar).  To optimize this use case, I have
> pre-loaded all my input files into HDFS and modified my use case so that it
> copies required files from HDFS. So, when tasktrackers works, it copies
> required number of input files to its local directory from HDFS. All my
> tasktrackers are also datanodes. I could see my use case has run faster. The
> only modification in my application is that file copy from HDFS instead of
> transfer using SCP. Also, my use case involves parallel operations (run in
> tasktrackers) and they do lot of file transfer. Now all these transfers are
> replaced with HDFS copy.
> Can anyone tell me HDFS transfer is faster as I witnessed? Is it because, it
> uses TCP/IP? Can anyone give me reasonable reasons to support the decrease
> of time?
> with thanks and regards
> rab

Harsh J

View raw message