hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: scp to namenode faster than dfs put?
Date Wed, 17 Sep 2008 14:41:58 GMT
While an scp will copy data to the namenode machine, it does *not* store 
  the data in dfs, it simply copies the data to namenode machine.   This 
is the same as copying data to any other machine.  The data isn't in DFS 
and is not accessible from DFS.  If the box running the namenode fails 
you lose your data.

The reason put is slower is that the data is actually being stored into 
the DFS on multiple machines in block format.  It is then accessible 
from programs accessing the DFS such as MR jobs.


Prasad Pingali wrote:
> Hello,
>    I observe that scp of data to the namenode is faster than actually putting 
> into dfs (all nodes coming from same switch and have same ethernet cards, 
> homogenous nodes)? I understand that "dfs -put" breaks the data into blocks 
> and then copies to datanodes, but shouldn't that be atleast as fast as 
> copying data to namenode from a single machine, if not faster?
> thanks and regards,
> Prasad Pingali,
> IIIT Hyderabad.

View raw message