Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 275BA17330 for ; Fri, 3 Apr 2015 12:57:50 +0000 (UTC) Received: (qmail 98307 invoked by uid 500); 3 Apr 2015 12:57:44 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 98206 invoked by uid 500); 3 Apr 2015 12:57:44 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 98196 invoked by uid 99); 3 Apr 2015 12:57:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2015 12:57:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rdubois@talend.com designates 74.201.97.206 as permitted sender) Received: from [74.201.97.206] (HELO mxout.myoutlookonline.com) (74.201.97.206) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2015 12:57:16 +0000 Received: from mxout.myoutlookonline.com (localhost [127.0.0.1]) by mxout.myoutlookonline.com (Postfix) with ESMTP id 9709461D0FC5 for ; Fri, 3 Apr 2015 08:54:51 -0400 (EDT) X-Virus-Scanned: by SpamTitan at myoutlookonline.com Received: from mxout.myoutlookonline.com (localhost [127.0.0.1]) by mxout.myoutlookonline.com (Postfix) with ESMTP id 6902261D0EE5 for ; Fri, 3 Apr 2015 08:54:50 -0400 (EDT) Received: from S10HUB001.SH10.lan (unknown [10.110.2.1]) by mxout.myoutlookonline.com (Postfix) with ESMTP id 675AD61D0E7A for ; Fri, 3 Apr 2015 08:54:50 -0400 (EDT) Received: from S10BE002.SH10.lan ([::1]) by S10HUB001.SH10.lan ([::1]) with mapi id 14.01.0438.000; Fri, 3 Apr 2015 08:56:02 -0400 From: Remy Dubois To: "user@hadoop.apache.org" Subject: Hadoop and HttpFs Thread-Topic: Hadoop and HttpFs Thread-Index: AdBuC93NLOkq8y/GQzS0kwLlY0Yvsw== Date: Fri, 3 Apr 2015 12:56:04 +0000 Message-ID: <05282E9A3A74C0439A233F5E6D56C812124501A5@S10BE002.SH10.lan> Accept-Language: fr-FR, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [80.65.233.82] Content-Type: multipart/alternative; boundary="_000_05282E9A3A74C0439A233F5E6D56C812124501A5S10BE002SH10lan_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_05282E9A3A74C0439A233F5E6D56C812124501A5S10BE002SH10lan_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi everyone, I used to think about the constraint that a Hadoop client has to know and t= o have access to each single datanode to be able to read/write from/to HDFS= . What happens if there are strong security policies on top of our cluster = ? I found the HttpFs (and webhdfs) that allows a client to talk to a single m= achine, in order to do what I'm looking for. Operations on HDFS work fine i= ndeed. Then, I've tried to execute a Pig (with Pig 0.12 on top of Hadoop 2.3.0) jo= b using the same way. And here, there is these FileContext and AbstractFile= System classes that don't allow any other FileSystem than hdfs and local. W= ebHdfs is then not accepted. It's not a problem until you need to register a jar in your Pig application= . Indeed, regarding the Load and the Store, prefixing their path with the w= ebhdfs:// scheme works. But when you register a jar in the Pig application,= the PigServer will reuse the initial configuration (the one with the hdfs:= //) in order to send the jars to the distributed cache. And at that point i= t fails because the client doesn't have access to the datanodes. Am I right in my understanding of what happens in that case ? Also, anyone meets this issue already? Any solution? Workaround? Thanks a lot in advance, R=E9my. --_000_05282E9A3A74C0439A233F5E6D56C812124501A5S10BE002SH10lan_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Hi everyone,

 

I used to think about the const= raint that a Hadoop client has to know and to have access to each single da= tanode to be able to read/write from/to HDFS. What happens if there are str= ong security policies on top of our cluster ?

I found the HttpFs (and webhdfs= ) that allows a client to talk to a single machine, in order to do what I&#= 8217;m looking for. Operations on HDFS work fine indeed.<= /p>

 

Then, I’ve tried to execu= te a Pig (with Pig 0.12 on top of Hadoop 2.3.0) job using the same way. And= here, there is these FileContext and AbstractFileSystem classes that don&#= 8217;t allow any other FileSystem than hdfs and local. WebHdfs is then not accepted.

It’s not a problem until = you need to register a jar in your Pig application. Indeed, regarding the L= oad and the Store, prefixing their path with the webhdfs:// scheme works. B= ut when you register a jar in the Pig application, the PigServer will reuse the initial configuration (the one with the hdfs:= //) in order to send the jars to the distributed cache. And at that point i= t fails because the client doesn’t have access to the datanodes.=

 

Am I right in my understanding = of what happens in that case ?

Also, anyone meets this issue a= lready? Any solution? Workaround?

 

Thanks a lot in advance,

 

R=E9my.

--_000_05282E9A3A74C0439A233F5E6D56C812124501A5S10BE002SH10lan_--