hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Urckle <urc...@gmail.com>
Subject Most Common ways to load data into Hadoop in production systems
Date Wed, 21 Jul 2010 16:30:53 GMT
Hi, I have a newbie question.

Scenario:
Hadoop version: 0.20.2
MR coding will be done in java.


Just starting out with my first Hadoop setup. I would like to know are 
there any best practice ways to load data into the dfs? I have 
(obviously) manually put data files into hdfs using the shell commands 
while playing with it at setup but going forward I will want to be 
retrieving large numbers of data feeds from remote, 3rd party locations 
and throwing them into hadoop for analysis later. What is the best way 
to automate this? Is it to gather the retrieved files into known 
locations to be mounted and then automate via script etc. to put the 
files into hdfs? Or is there some other practice? I've not been able to 
find specific use case yet... all docs cover the basic fs command 
without giving much details about more advanced setups.

thanks for any info

regards

Mime
View raw message