hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "C.V.Krishnakumar" <cvkrishnaku...@me.com>
Subject Re: using 'fs -put' from datanode: all data written to that node's hdfs and not distributed
Date Tue, 13 Jul 2010 17:22:40 GMT
Oh. Thanks for the reply.
Regards,
Krishna
On Jul 13, 2010, at 9:51 AM, Allen Wittenauer wrote:

> 
> When you write on a machine running a datanode process, the data is *always* written
locally first.  This is to provide an optimization to the MapReduce framework.   The lesson
here is that you should *never* use a datanode machine to load your data.  Always do it outside
the grid.
> 
> Additionally, you can use fsck (filename) -files -locations -blocks to see where those
blocks have been written.  
> 
> On Jul 13, 2010, at 9:45 AM, Nathan Grice wrote:
> 
>> To test the block distribution, run the same put command from the NameNode
>> and then again from the DataNode.
>> Check the HDFS filesystem after both commands. In my case, a 2GB file was
>> distributed mostly evenly across the datanodes when put was run on the
>> NameNode, and then put only on the DataNode where I ran the put command
>> 
>> On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar <cvkrishnakumar@me.com>wrote:
>> 
>>> Hi,
>>> I am a newbie. I am curious to know how you discovered that all the blocks
>>> are written to datanode's hdfs? I thought the replication by namenode was
>>> transparent. Am I missing something?
>>> Thanks,
>>> Krishna
>>> On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote:
>>> 
>>>> We are trying to load data into hdfs from one of the slaves and when the
>>> put
>>>> command is run from a slave(datanode) all of the blocks are written to
>>> the
>>>> datanode's hdfs, and not distributed to all of the nodes in the cluster.
>>> It
>>>> does not seem to matter what destination format we use ( /filename vs
>>>> hdfs://master:9000/filename) it always behaves the same.
>>>> Conversely, running the same command from the namenode distributes the
>>> files
>>>> across the datanodes.
>>>> 
>>>> Is there something I am missing?
>>>> 
>>>> -Nathan
>>> 
>>> 
> 


Mime
View raw message