hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 肖之 <zwx...@fudan.edu.cn>
Subject re: Even HDFS data distribution
Date Mon, 30 Nov 2009 02:24:46 GMT
You could upload these logs from non-DataNodes, like NameNode or nodes
outside HDFS.


发件人: Igor Katkov [mailto:ikatkov@gmail.com] 
发送时间: 2009年11月28日 4:00
收件人: hdfs-user@hadoop.apache.org
主题: Even HDFS data distribution


What is the usual approach/techniques to achieve even HDFS data
I have a bunch of files (logs) outside of HDFS, if I copy them all to
a node within HDFS and then do something like

./hadoop fs -copyFromLocal /mnt/accesslog-agregated.2009-10-04.log /logs

it would write block locally first and then to some other node.
If I do that 100 times, most of the data will be sitting on the host I
doing these operations on.

It would be nice, to pick a host a random and store the very first block
Immediately I can see only one workaround - manually split these log
files in as many sets as many HDFS nodes I have. Upload/scp them to
HDFS nodes and then ./hadoop fs -copyFromLocal
This surely is a lot of manual work, so I guess there must be a trick
to make it happen with much less hassle.


P.S. I googled it, but did not find any relative discussions.

View raw message