hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: Importing Data to HDFS
Date Tue, 20 Jul 2010 07:27:42 GMT
Hey Urckle,

I'm biased, but I'd recommend checking out Sqoop (
http://github.com/cloudera/sqoop) for moving data from RDBMS systems into
HDFS/Hive/HBase and Flume (http://github.com/cloudera/flume) for moving log
files into HDFS/Hive/HBase.

For moving large sets of files into HDFS, I think distcp (
http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/distcp.html) is your
best bet.


On Fri, Jul 16, 2010 at 4:51 AM, Urckle <urckle@gmail.com> wrote:

> Scenario:
> Hadoop version: 0.20.2
> MR coding will be done in java.
> Just starting out with my first Hadoop setup. I would like to know are
> there any best practice ways to load data into the dfs? I have (obviously)
> manually put data files into hdfs using the shell commands while playing
> with it at setup but going forward I will want to be retrieving large
> numbers of data feeds from remote, 3rd party locations and throwing them
> into hadoop for analysis later. What is the best way to automate this? Is it
> to gather the retrieved files into known locations to be mounted and then
> automate via script etc. to put the files into hdfs? Or is there some other
> practice? I've not been able to find specific use case yet... all docs cover
> the basic fs command without giving much details about more advanced setups.
> thanks for any info
> regards

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message