hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: Importing Data into HDFS
Date Wed, 29 Aug 2012 21:03:30 GMT

Am 29.08.2012 um 22:58 schrieb Steve Sonnenberg <steveisoft@gmail.com>:

> Is there any way to import data into HDFS without copying it in? (kinda of like by reference)
> I'm pretty sure the answer to this is no.
> What I'm looking for is something that will take existing NFS data and access it as an
HDFS filesystem.
> Use case: I have existing data in a warehouse that I would like to run MapReduce etc.
on without copying it into HDFS.
> If the data were in S3, could I run MapReduce on it?

Hadoop has a filesystem abstraction layer that supports many physical filesystem implementation.
Such as HDFS of course, but also the local filesystem, S3, FTP, and others.

You simply loose data locality if you're running MapReduce on data that is -well- not local
to where it's been processed.

With data stored in S3, a common solution is to fire up an EMR (elastic mapreduce) cluster
inside Amazon's datacenter to work on your S3 data. It's not real data locality, but at least
the processing happens in the same data center as your data. And once you're done processing
the data, you can take down the EMR cluster.


Kai Voigt

View raw message