hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5569) WebHDFS should support a deny/allow list for data access
Date Wed, 04 Dec 2013 02:44:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838528#comment-13838528

Colin Patrick McCabe commented on HDFS-5569:

Hi Adam, I don't understand why you jump immediately to the assumption that IP spoofing is
necessary to break IP-based authentication.  There are plenty of networks in the world where
you can join without any trouble.  One example is a class B network such as 172.16.X.Y.  If
the administrator tries to filter addresses 172.16.1.X but allow 172.16.2.X, it would be easy
for an attacker to reconfigure his IP from a 172.16.1.X to a 172.16.2.X using just ifconfig.
 I used to work at a company with such a class B address.  You may argue that the system administrator
was foolish to believe that he was secure in this scenario.  I would argue that giving people
confusing security configuration options is foolish.

In many cases, we can also use "source routing" to get around IP-based restrictions.  Keep
in mind, this does not require spoofing!  Source routing allows the packet to specify its
own route through the network.  This potentially allows you to reach destinations that you
would not otherwise be able to get to.  Many routers now disable source-routed packets, why
open a hole for those that do not?

Now, let's turn to considering spoofing itself.  Successful IP spoofing often does not allow
the attacker to get back a response to his packets.  However, that isn't necessarily needed
in this case, because there are many webHDFS operations that delete files, damage data, etc.

Physical security of networks is often an issue.  Many times open ethernet jacks are available
in an office or data center and you can get an IP address.  Maybe even one that is inside
the various firewalls.  This is why people use real security systems like Kerberos, Active
Directory, etc.

There's more information on how to defeat IP-based filtering here:  http://technet.microsoft.com/library/cc723706.aspx
 It calls SNMP "a security disaster" partly because it often relies on IP-based filtering
for security.  I don't think we should be trying to reproduce a security scheme that everyone
agrees is a disaster.

Regarding DNS: I've dealt with many clusters for whom DNS lookup was a bottleneck.  You may
argue that they should have configured DNS better.  But regardless, a security scheme that
requires contacting DNS all the time would still cause significant regressions for those users.
 See Daryn Sharp's patch for https://issues.apache.org/jira/browse/HDFS-3990, which was designed
partly to avoid unnecessary DNS lookups.  There have been many other such patches, from people
at Yahoo and other companies.

I think you should explain why the various alternatives people have offered here don't solve
your problem.  It seems really easy to use httpfs (plus perhaps a proxy) to get filtering
as fine-grained as you want.  The whole point of implementing httpfs was that the HTTP protocol
could easily be filtered, proxied, etc. by third-party tools.  If there are use cases that
httpfs does not address, let's fix them rather than creating another parallel security system
that does not follow best practices.

> WebHDFS should support a deny/allow list for data access
> --------------------------------------------------------
>                 Key: HDFS-5569
>                 URL: https://issues.apache.org/jira/browse/HDFS-5569
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Adam Faris
>              Labels: features
> Currently we can't restrict what networks are allowed to transfer data using WebHDFS.
 Obviously we can use firewalls to block ports, but this can be complicated and problematic
to maintain.  Additionally, because all the jetty servlets run inside the same container,
blocking access to jetty to prevent WebHDFS transfers also blocks the other servlets running
inside that same jetty container.
> I am requesting a deny/allow feature be added to WebHDFS.  This is already done with
the Apache HTTPD server, and is what I'd like to see the deny/allow list modeled after.  

This message was sent by Atlassian JIRA

View raw message