hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Mohajerian <mohaj...@gmail.com>
Subject Re: Hadoop and HttpFs
Date Fri, 03 Apr 2015 21:01:29 GMT
May be this helps:

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig



On Fri, Apr 3, 2015 at 5:56 AM, Remy Dubois <rdubois@talend.com> wrote:

>  Hi everyone,
>
>
>
> I used to think about the constraint that a Hadoop client has to know and
> to have access to each single datanode to be able to read/write from/to
> HDFS. What happens if there are strong security policies on top of our
> cluster ?
>
> I found the HttpFs (and webhdfs) that allows a client to talk to a single
> machine, in order to do what I’m looking for. Operations on HDFS work fine
> indeed.
>
>
>
> Then, I’ve tried to execute a Pig (with Pig 0.12 on top of Hadoop 2.3.0)
> job using the same way. And here, there is these FileContext and
> AbstractFileSystem classes that don’t allow any other FileSystem than hdfs
> and local. WebHdfs is then not accepted.
>
> It’s not a problem until you need to register a jar in your Pig
> application. Indeed, regarding the Load and the Store, prefixing their path
> with the webhdfs:// scheme works. But when you register a jar in the Pig
> application, the PigServer will reuse the initial configuration (the one
> with the hdfs://) in order to send the jars to the distributed cache. And
> at that point it fails because the client doesn’t have access to the
> datanodes.
>
>
>
> Am I right in my understanding of what happens in that case ?
>
> Also, anyone meets this issue already? Any solution? Workaround?
>
>
>
> Thanks a lot in advance,
>
>
>
> Rémy.
>

Mime
View raw message