Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of shekhar2581@gmail.com
 designates 209.85.216.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CANXCz3Tuz4i4x+Mt8H1vcOmovPy1jru5eS+qd2Q9EJWf8bg5Rg@mail.gmail.com>
References: 
 <CANXCz3Tuz4i4x+Mt8H1vcOmovPy1jru5eS+qd2Q9EJWf8bg5Rg@mail.gmail.com>
Date: Sat, 25 Jan 2014 15:47:13 +0530
Message-ID: 
 <CAJxyRCiXYDNpvFvd-rGDZYFyCmJZ9TnH_P806Z5gQ+ZOMT0Umw@mail.gmail.com>
Subject: Re: HDFS data transfer is faster than SCP based transfer?
From: Shekhar Sharma <shekhar2581@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e0149cc0e4034f404f0c8c7fa

--089e0149cc0e4034f404f0c8c7fa
Content-Type: text/plain; charset=ISO-8859-1

WHEN u put the data or write into HDFS, 64kb of data is written on client
side and then it is pushed through pipeline and this process continue till
64mb of data is written which is the block size defined by the client.

While on the other hand scp will try to buffer the entire data. Passing
chunks of data would be faster than passing larger data.

Please check how writing happen in HDFS. That will give you clear picture
On 24 Jan 2014 10:56, "rab ra" <rabmdu@gmail.com> wrote:

> Hello
>
> I have a use case that requires transfer of input files from remote
> storage using SCP protocol (using jSCH jar).  To optimize this use case, I
> have pre-loaded all my input files into HDFS and modified my use case so
> that it copies required files from HDFS. So, when tasktrackers works, it
> copies required number of input files to its local directory from HDFS. All
> my tasktrackers are also datanodes. I could see my use case has run faster.
> The only modification in my application is that file copy from HDFS instead
> of transfer using SCP. Also, my use case involves parallel operations (run
> in tasktrackers) and they do lot of file transfer. Now all these transfers
> are replaced with HDFS copy.
>
> Can anyone tell me HDFS transfer is faster as I witnessed? Is it because,
> it uses TCP/IP? Can anyone give me reasonable reasons to support the
> decrease of time?
>
>
> with thanks and regards
> rab
>

--089e0149cc0e4034f404f0c8c7fa
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">WHEN u put the data or write into HDFS, 64kb of data is writ=
ten on client side and then it is pushed through pipeline and this process =
continue till 64mb of data is written which is the block size defined by th=
e client. </p>

<p dir=3D"ltr">While on the other hand scp will try to buffer the entire da=
ta. Passing chunks of data would be faster than passing larger data.</p>
<p dir=3D"ltr">Please check how writing happen in HDFS. That will give you =
clear picture</p>
<div class=3D"gmail_quote">On 24 Jan 2014 10:56, &quot;rab ra&quot; &lt;<a =
href=3D"mailto:rabmdu@gmail.com">rabmdu@gmail.com</a>&gt; wrote:<br type=3D=
"attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex">
<div>Hello</div>
<div>=A0</div>
<div>I have a use case that requires transfer of input files from remote st=
orage using SCP protocol (using jSCH jar).=A0 To optimize this use=A0case, =
I have pre-loaded all my input files into HDFS and modified my use case so =
that it copies required files from HDFS. So, when tasktrackers works, it co=
pies required number of input=A0files=A0to=A0its local directory from HDFS.=
=A0All my tasktrackers are also datanodes. I could see my use case has run =
faster. The only modification in my application is that file copy from HDFS=
 instead of transfer using SCP. Also, my use case involves parallel operati=
ons (run in tasktrackers) and they do lot of file transfer. Now all these t=
ransfers are replaced with HDFS copy. </div>


<div>=A0</div>
<div>Can anyone tell me HDFS transfer is faster as I witnessed? Is it becau=
se, it uses TCP/IP? Can anyone give me reasonable reasons to support the de=
crease of time?</div>
<div>=A0</div>
<div>=A0</div>
<div>with thanks and regards</div>
<div>rab</div>
</blockquote></div>

--089e0149cc0e4034f404f0c8c7fa--