hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remy Dubois <rdub...@talend.com>
Subject Hadoop and HttpFs
Date Fri, 03 Apr 2015 12:56:04 GMT
Hi everyone,

I used to think about the constraint that a Hadoop client has to know and to have access to
each single datanode to be able to read/write from/to HDFS. What happens if there are strong
security policies on top of our cluster ?
I found the HttpFs (and webhdfs) that allows a client to talk to a single machine, in order
to do what I'm looking for. Operations on HDFS work fine indeed.

Then, I've tried to execute a Pig (with Pig 0.12 on top of Hadoop 2.3.0) job using the same
way. And here, there is these FileContext and AbstractFileSystem classes that don't allow
any other FileSystem than hdfs and local. WebHdfs is then not accepted.
It's not a problem until you need to register a jar in your Pig application. Indeed, regarding
the Load and the Store, prefixing their path with the webhdfs:// scheme works. But when you
register a jar in the Pig application, the PigServer will reuse the initial configuration
(the one with the hdfs://) in order to send the jars to the distributed cache. And at that
point it fails because the client doesn't have access to the datanodes.

Am I right in my understanding of what happens in that case ?
Also, anyone meets this issue already? Any solution? Workaround?

Thanks a lot in advance,


View raw message