hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: scp to namenode faster than dfs put?
Date Wed, 17 Sep 2008 18:30:30 GMT
pvvpr wrote:
> The time seemed to be around double the time taken to scp. Didn't realize
> it could be due to replication.

twice slow is not expected. One possibility is that your client is also 
one of the datanodes (i.e. you are reading from and writing to the same 


> Regd dfs being faster than scp, the statement came more out of expectation
> (or wish list) rather than anything else. Since scp is the most elementary
> way of copying files, was thinking if the network topology of the cluster
> can be exploited in any way. The only intuition I had was there may be
> some approaches faster than scp, if any concepts from P2P file sharing are
> used here. Though I didn't fully explore P2P, I thought there may be some
> new developments in that area which may be useful here? After napster's
> centralized way of copying, I think there were quite a bit of
> improvements? Just thinking loud.
> - Prasad.
>> How much slower is 'dfs -put' any way? How large is the file you are
>> copying?
>>  >  but shouldn't that
>>  > be atleast as fast as copying data to namenode from a single machine,
>> It would be "at most" as fast as scp assuming you are not cpu bound. Why
>> would you think dfs be faster even if it copying to a single replica?
>> Raghu.
>> Dennis Kubes wrote:
>>> While an scp will copy data to the namenode machine, it does *not* store
>>>  the data in dfs, it simply copies the data to namenode machine.   This
>>> is the same as copying data to any other machine.  The data isn't in DFS
>>> and is not accessible from DFS.  If the box running the namenode fails
>>> you lose your data.
>>> The reason put is slower is that the data is actually being stored into
>>> the DFS on multiple machines in block format.  It is then accessible
>>> from programs accessing the DFS such as MR jobs.
>>> Dennis
>>> Prasad Pingali wrote:
>>>> Hello,
>>>>    I observe that scp of data to the namenode is faster than actually
>>>> putting into dfs (all nodes coming from same switch and have same
>>>> ethernet cards, homogenous nodes)? I understand that "dfs -put" breaks
>>>> the data into blocks and then copies to datanodes, but shouldn't that
>>>> be atleast as fast as copying data to namenode from a single machine,
>>>> if not faster?
>>>> thanks and regards,
>>>> Prasad Pingali,
>>>> IIIT Hyderabad.

View raw message