hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10099) Reduce chance for RPC denial of service
Date Thu, 14 Nov 2013 15:25:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822511#comment-13822511
] 

Daryn Sharp commented on HADOOP-10099:
--------------------------------------

This isn't an easy problem to solve correctly.  Luckily, hadoop clients are "well behaved"
in the sense that only socket is opened per ugi, and reconnects are delayed.  Malicious clients
are the real problem.

Throttling by equating a client = host is a trivial way to identify a client.  That's undesirable
in many cases.  One errant MR task spamming connections to the NN will trigger a DoS for other
tasks running on that node.  Or a task spamming connections to the RM or NM will DoS AMs on
that node that need to make container requests or launches.  Admittedly it's better than a
total DoS.

A combination of host + ugi would be a better identifier for throttling, but that's infeasible
because a client can DoS by simply spamming socket opens or even just socket open/closes -
in which case the ugi isn't even known yet.  A client can also spam sockets and then trickle
1-byte at a time every 2*idle to keep the connection from idling out.

----

Perhaps the KISS approach is an authorization log warn when a given connection/host/sec rate
or total connection/host watermark is exceeded.

I filed this jira in response to comments on my other RPC performance jiras.  I think this
is a minor issue since hadoop has survived thus far with no DoS protection.

> Reduce chance for RPC denial of service
> ---------------------------------------
>
>                 Key: HADOOP-10099
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10099
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Priority: Minor
>
> A RPC server may accept an unlimited number of connections unless indirectly bounded
by a blocking operation in the RPC handler threads.  The NN's namespace locking happens to
cause this blocking, but other RPC servers such as yarn's generate async events which allow
unbridled connection acceptance.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message