hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin P. McCabe" <cmcc...@apache.org>
Subject Re: Hadoop Common: Why not re-use the Security model offered by SELINUX?
Date Mon, 30 Mar 2015 18:14:46 GMT
As ATM and Steve have already commented, selinux isn't really
comparable to the existing Hadoop security framework.  These are just
two things that have different functions.  The Hadoop security
framework needs to deal with authenticating users over the network,
managing Kerberos and active directory, and using abstractions like
authentication tokens to perform remote auth.  selinux works in a
single-node context to set up constraints on what actions daemons can
perform-- essentially a very detailed list of rules of what they can
and cannot do.  This list of rules goes beyond the traditional POSIX
permission model or even POSIX extended attributes.

I do feel that having an selinux policy for Hadoop could provide
defense in depth in certain cases.  However, you have to realize a few

1. a lot of Linux security vulnerabilities are kernel-level
vulnerabilities that selinux can't mitigate.  If you give a bad guy
the ability to run arbitrary code on your Linux box, it's pretty
likely that you'll get rooted by something like CVE-2014-9322,
CVE-2014-3153, etc... with or without selinux.

2. selinux policies are very complex and need to change any time
daemon behavior or configuration changes.  For that reason, they are
usually maintained by the linux distribution maintainer, not the
software maintainer.

3. It will be almost impossible to write a reasonable selinux policy
for an arbitrary YARN job because you don't know what that job needs
to do.  It's hard to write a detailed list of rules for something that
has arbitrary user-defined behavior.

4. Hadoop is written in java so compromising Hadoop daemons through
buffer overflows, etc. is not the biggest threat.  There probably are
such compromises out there, but pretty much all of our CVEs have been
in other areas.


On Sat, Mar 28, 2015 at 6:06 AM, Steve Loughran <stevel@hortonworks.com> wrote:
> SELinux does nothing for Hadoop cluster security at the data-layer, which is why there
tools on top, not only to lock down systems, but to provide better data governance: where
did things come from, has it been tainted by merging with sensitive data, etc, etc.
> Where it could be good is
> 1. Allow hadoop nodes to be more secure on the intranet itself. It's another layer in
the defense-in-depth story, so if some standard Linux service on the system (ssh, ntpd, ...)
gets compromised, the damage is partially limited. My home server is SELinux-enforced, for
> 2. Reduce the impact of anything malicious trying to run as a YARN-scheduled app.
> #2 is moot until you have Kerberos up; until then the whole of HDFS is visible. Once
you have it up SE linux could restrict what damage a privilege-esclated YARN job could do
to the local hosts. But I'm still reasonably confident that given the ability to run 200+
containers on a Hadoop cluster for a few hours I could (a) portscan an intranet for SMB &
sharepoint hosts, and (b) execute enough TCP open connections to overload the services.
> I'm +1 to getting Hadoop to run on SELinux; I think mainly we've been lazy.
> But it's not going to keep your Hadoop-stored data safe, lock-down your network apps
or help mitigate the intentional or unintentional damage that hadoop code can do if on the
same intranet as the rest of your organisation. Or, as AW on Nicholas can attest, the damage
you can do from running network traffic- or CPU-intensive code from taking down the network
or power supplies of the rest of the datacentre.
>> On 28 Mar 2015, at 02:33, jay vyas <jayunit100.apache@gmail.com> wrote:
>> Tools like freeipa and so on are very synergistic first steps down the road
>> of making hadoop more enterprise friendly.  For example, if you let freeipa
>> manage users, kerberos and so on - then you can pave the way down the road
>> for selinux as well (since these tools are able to work together).
>> I think in general, the more hadoop works with the linux community , rather
>> than rebuilding its own solutions, the easier it will be to integrate in
>> broader and broader deployments - so in theory working to run  selinux and
>> hadoop together is probably a win-win.
>> On Thu, Mar 26, 2015 at 1:22 PM, Aaron T. Myers <atm@cloudera.com> wrote:
>>> In addition to everything Allen has already said, which I entirely agree
>>> with, I'll also point out that much of the focus on Hadoop security has
>>> been related to authentication, and only somewhat more recently on
>>> providing advanced authorization capabilities. I'll readily admit to not
>>> knowing much about SE Linux's capabilities, but my impression is that it
>>> wouldn't do much to be able to help out with authentication within Hadoop,
>>> and hence wouldn't have been a realistic option when Hadoop's security work
>>> was started many years ago.
>>> --

View raw message