Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D7B3D897 for ; Wed, 29 Aug 2012 21:04:05 +0000 (UTC) Received: (qmail 58361 invoked by uid 500); 29 Aug 2012 21:04:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 58225 invoked by uid 500); 29 Aug 2012 21:04:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58214 invoked by uid 99); 29 Aug 2012 21:04:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 21:04:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [87.230.46.220] (HELO vwp3725.webpack.hosteurope.de) (87.230.46.220) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 21:03:53 +0000 Received: from dslb-188-096-164-240.pools.arcor-ip.net ([188.96.164.240] helo=[192.168.2.107]); authenticated by vwp3725.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) id 1T6pQB-00043f-8s; Wed, 29 Aug 2012 23:03:31 +0200 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: Importing Data into HDFS From: Kai Voigt In-Reply-To: Date: Wed, 29 Aug 2012 23:03:30 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1486) X-bounce-key: webpack.hosteurope.de;k@123.org;1346274233;68daf39b; X-Virus-Checked: Checked by ClamAV on apache.org Hello, Am 29.08.2012 um 22:58 schrieb Steve Sonnenberg : > Is there any way to import data into HDFS without copying it in? = (kinda of like by reference) > I'm pretty sure the answer to this is no. >=20 > What I'm looking for is something that will take existing NFS data and = access it as an HDFS filesystem. > Use case: I have existing data in a warehouse that I would like to run = MapReduce etc. on without copying it into HDFS. >=20 > If the data were in S3, could I run MapReduce on it? Hadoop has a filesystem abstraction layer that supports many physical = filesystem implementation. Such as HDFS of course, but also the local = filesystem, S3, FTP, and others. You simply loose data locality if you're running MapReduce on data that = is -well- not local to where it's been processed. With data stored in S3, a common solution is to fire up an EMR (elastic = mapreduce) cluster inside Amazon's datacenter to work on your S3 data. = It's not real data locality, but at least the processing happens in the = same data center as your data. And once you're done processing the data, = you can take down the EMR cluster. Kai --=20 Kai Voigt k@123.org