spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <>
Subject Re: Ranger-like Security on Spark
Date Fri, 04 Sep 2015 00:43:28 GMT
On Thu, Sep 3, 2015 at 5:15 PM, Matei Zaharia <> wrote:
> Even simple Spark-on-YARN should run as the user that submitted the job,
> yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of
> Ranger.

It's slightly more complicated than that (without kerberos, the
underlying process runs as the same user running the YARN daemons, but
the connections to HDFS and other Hadoop services identify as the user
who submitted the application), but the end effect is what Matei
describes. I also do not know about how Ranger enforces things.

Also note that "simple authentication" is not secure at all. You're
basically just asking your users to be nice instead of actually
enforcing anything. Any user can tell YARN that he's actually someone
else when starting the application, and YARN will believe him. Just
say "HADOOP_USER_NAME=somebodyelse" and you're good to go!

> On Sep 3, 2015, at 4:57 PM, Jörn Franke <> wrote:
> Well if it needs to read from hdfs then it will adhere to the permissions
> defined there And/or in ranger. However, I am not aware that you can protect
> dataframes, tables or streams in general in Spark.
> Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz <> a
> écrit :
>> Hi Matei,
>> Thanks for your answer.
>> My question is regarding simple authenticated Spark-on-YARN only, without
>> Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my
>> HDFS user and only be able to access files I am entitled to read/write? Will
>> it enforce HDFS ACLs and Ranger policies as well?
>> Best regards, Daniel.
>> > On 03 Sep 2015, at 21:16, Matei Zaharia <> wrote:
>> >
>> > If you run on YARN, you can use Kerberos, be authenticated as the right
>> > user, etc in the same way as MapReduce jobs.
>> >
>> > Matei
>> >
>> >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz
>> >> <> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I really enjoy using Spark. An obstacle to sell it to our clients
>> >> currently is the missing Kerberos-like security on a Hadoop with simple
>> >> authentication. Are there plans, a proposal, or a project to deliver a
>> >> Ranger plugin or something similar to Spark. The target is to differentiate
>> >> users and their privileges when reading and writing data to HDFS? Is
>> >> Kerberos my only option then?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message