Return-Path: Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: (qmail 5106 invoked from network); 15 Jun 2010 17:25:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Jun 2010 17:25:10 -0000 Received: (qmail 78553 invoked by uid 500); 15 Jun 2010 17:25:10 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 78279 invoked by uid 500); 15 Jun 2010 17:25:09 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 78269 invoked by uid 99); 15 Jun 2010 17:25:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jun 2010 17:25:09 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=10.0 tests=AWL,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of awittenauer@linkedin.com designates 69.28.149.25 as permitted sender) Received: from [69.28.149.25] (HELO esv4-mav03.corp.linkedin.com) (69.28.149.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jun 2010 17:25:05 +0000 DomainKey-Signature: s=prod; d=linkedin.com; c=nofws; q=dns; h=X-IronPort-AV:Received:Received:Received:From:To:Subject: Thread-Topic:Thread-Index:Date:Message-ID:References: In-Reply-To:Accept-Language:Content-Language: X-MS-Has-Attach:X-MS-TNEF-Correlator:Content-Type: Content-ID:Content-Transfer-Encoding:MIME-Version: Return-Path:X-OriginalArrivalTime; b=EqOx3W8JUFXVJrZSUCCAyeFFTsW0bXSQrpdT3WF6kdXzOU8N/M4KvkyC tPl8qJlt6ptGf4THFpgBjEgsXvyENYhBRtR4sx2/97Ydsz56xhisQ2FrP jqhz7+D9itRi6mM; DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=linkedin.com; i=awittenauer@linkedin.com; q=dns/txt; s=proddkim; t=1276622705; x=1308158705; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Allen=20Wittenauer=20 |Subject:=20Re:=20Integrating=20Lustre=20and=20HDFS|Date: =20Tue,=2015=20Jun=202010=2017:24:43=20+0000|Message-ID: =20 |To:=20""=20|MIME-Version:=201.0 |Content-Transfer-Encoding:=20quoted-printable |Content-ID:=20 |In-Reply-To:=20|References:=20=0D=0A=20<7C47C8 A4-5219-4333-ABCA-7FF326E022AF@linkedin.com>=0D=0A=20=0D=0A=20=0D=0A=20=0D=0A=20=0D=0A=20; bh=+88To5yrYpDwjQUfFNzoo5JEyic7h/+0zUR5RFcuVxw=; b=QSn88+NF96zmGN8/NC0USpofDjnaam4sn+CrNOy33WnAkhKGz9RTrRUs h+/6VAaMXTtKj1/i3CFvf0MY649+HysCjc7WfykOKeSQ6rk1NMq07RMJ+ 0VSF7TY9/oYZgW/; X-IronPort-AV: E=Sophos;i="4.53,421,1272870000"; d="scan'208";a="13062383" Received: from esv4-exctest.linkedin.biz ([172.18.46.60]) by CORP-MAIL.linkedin.biz with Microsoft SMTPSVC(6.0.3790.3959); Tue, 15 Jun 2010 10:24:44 -0700 Received: from ESV4-CAS02.linkedin.biz (172.18.46.142) by esv4-exctest.linkedin.biz (172.18.46.60) with Microsoft SMTP Server (TLS) id 14.0.682.1; Tue, 15 Jun 2010 10:24:44 -0700 Received: from ESV4-EXC03.linkedin.biz ([fe80::985a:a6b4:f1e1:23b0]) by esv4-cas02.linkedin.biz ([172.18.46.142]) with mapi; Tue, 15 Jun 2010 10:24:44 -0700 From: Allen Wittenauer To: "" Subject: Re: Integrating Lustre and HDFS Thread-Topic: Integrating Lustre and HDFS Thread-Index: AQHLCICi0a7Q1I5gr02k6tEKCqY/dJJ7rhSAgAALooCAANa/AIABBTKAgACPUICABSsdgIAAc4yA Date: Tue, 15 Jun 2010 17:24:43 +0000 Message-ID: References: <7C47C8A4-5219-4333-ABCA-7FF326E022AF@linkedin.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 15 Jun 2010 17:24:44.0872 (UTC) FILETIME=[A5AF2480:01CB0CAF] No, i'm saying your mapreduce code needs to explicitly reference every file= system that it needs to access. You can't rely upon fs.default.name*. Th= e distcp code could provide some guidance on how to do this. * maybe it isn't clear why this is, so let me spell it out a bit: fs.defau= lt.name is just that--a default. When you run hadoop dfs -ls with no quali= fying file system url, it uses fs.default.name to figure out where that fil= e system is actually at. Since you need to access two different file syste= ms, you cannot make any such assumptions safely. This is also why you can'= t list two file systems in fs.default.name. When you run 'hadoop dfs -ls= ', it wouldn't be logical as to what exactly Hadoop should do, especially i= f the paths requested *conflict*. On Jun 15, 2010, at 3:31 AM, Vikas Ashok Patil wrote: > Hello Allen, >=20 > Sorry for bugging you regarding the same problem again. If you say "we ne= ed > to be explicit having multiple file-systems" for map reduce jobs, are you > hinting on code changes to be made to hadoop ? Please provide more detail= s > on this if possible. >=20 > Thanks, > Vikas >=20 > On Sat, Jun 12, 2010 at 9:05 AM, Vikas Ashok Patil = wrote: >=20 >> Hello Allen, >>=20 >> Thanks for the reply. >>=20 >> You are right about trying to run two distributed filesystems. The reaso= n >> being, there are certain restrictions (in our cluster environment) to >> include the local file system into lustre. Please tell me how would I ma= ke >> mapreduce access more than one file system. At least the configs don't s= eem >> to allow it. >>=20 >> Thanks, >> Vikas A Patil >>=20 >>=20 >> On Sat, Jun 12, 2010 at 12:32 AM, Allen Wittenauer < >> awittenauer@linkedin.com> wrote: >>=20 >>> On Jun 10, 2010, at 8:27 PM, Vikas Ashok Patil wrote: >>>=20 >>>> Thanks for the replies. >>>>=20 >>>> If I have fs.default.name =3D file://my_lustre_mount_point , then only >>> the >>>> lustre filesystem will be used. I would like to have something like >>>>=20 >>>> fs.default.name=3Dfile://my_lustre_mount_point , hdfs://localhost:9123 >>>>=20 >>>> so that both local filesystem and lustre are in use. >>>>=20 >>>> Kindly correct me if I am missing something here. >>>=20 >>> I guess we're all confused as to your use case. Why do you want to run >>> two distributed file systems on the same nodes? Why can't you use Lust= re >>> for all your needs? >>>=20 >>> As to fs.default.name, you can only have one. [That's why it is a >>> default. *smile*] If you want to access more than one file system from >>> within MapReduce, you'll need to specify it explicitly. >>=20 >>=20 >>=20