hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9160) Adopt JMX for management protocols
Date Thu, 10 Jan 2013 00:48:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549249#comment-13549249
] 

Luke Lu commented on HADOOP-9160:
---------------------------------

Thanks for the feedback folks. I hope you all enjoyed the holidays!

Anyway, let's back up a bit and take a look at pros and cons of proposed management protocols
(JMX, custom REST, custom RPC) with a more systematic approach, since anecdotal arguments
about limited JMX deployment is not convincing, as one can argue that for any single Hadoop
cluster that doesn't use JMX, there are more than 10 WebSphere/JBoss/WebLogic/Tomcat clusters
that use JMX securely just fine.

Let me clarify the scope of the admin/management functions the proposed protocol targets:
dfsadmin, mradmin (and yarn equivalent) and resource management functions, e.g., commission/decommission/suspend/resume
data and compute (TT/NM) nodes, slots/resource configuration for compute nodes etc.  Anything
in user facing (FileSystem and MR interfaces) interfaces are not included. The users of the
protocols are sysadmins and management daemons. IMO, some of the most important aspects of
management protocol are: interoperability, compatibility, isolation, security and ease of
implementation.

*Interoperability* is critical for management protocols. It's unfriendly, tedious and architecturally
unsound to require common management tools to use application specific protocols. Just imagine
managing clusters with many workloads where Hadoop is only one of them and each component
requires a different management protocol. This is one of the main reasons that we have standards
in management protocols. JMX is _the_ standard for managing JVM services. If JMX is based
on HTTP instead of RMI, we probably won't have this debate, as it's reasonable to argue that
JMX on RMI is unfriendly to non-JVM languages. Fortunately because JMX _is_ the standard,
people outside the Hadoop community have already addressed the concern with REST APIs for
JMX. When I filed HADOOP-7144, the implementations were not mature enough to adopt. Now a
promising open source implemenation have emerged: [Jolokia|http://www.jolokia.org/], which
evolved from Jmx4Perl and now have good support for [Javascript|http://www.jolokia.org/client/javascript.html],
[Perl|http://www.jolokia.org/client/perl.html] and [Python|https://github.com/cwood/pyjolokia]
with (relatively) extensive [documentation|http://www.jolokia.org/documentation.html]. The
software is Apache 2.0 licensed with artifacts published in maven central. It's trivial to
enable jolokia as a servlet, so that the hadoop-auth SPNEGO plugin can be used (see demo patch).
JMX is not just a protocol. It's a management infrastructure and enable client to browse,
discover and use new management services without client modification/recompilation. Jolokia
supports list/search/read/write/execute of management services and more (bulk requests). 
It even has a readline enabled shell (j4psh) to interact with JMX with no client JVM startup
overhead. Significant work will be required to build a custom REST API and client libraries
for different languages. Even when some of the Hadoop daemons are rewritten in other languages,
these daemons can implement a subset of Jolokia REST/JSON API to take advantage of the rich
client support for admin friendly languages (python and especially perl).

Interoperability scores (0: poor, 1: fair, 2: good): JMX (with Jolokia): 2; custom REST: 1;
custom RPC: 0.

*Compatibility* is important for management protocols. It's inconvenient/tedious for management
tools to manage services that have incompatible management protocols across different versions
of managed services. JMX is a stable industry standard for more than a decade. Jolokia has
been evolved for 3 years with relatively stable REST APIs. A new custom REST API will need
similar time frame to evolve, though it's doable albeit tedious to implement compatibility
on the client side for REST APIs if the protocol supports version requests (which Jolokia
has). Compatibility is not supported for stable branches of Hadoop. It's also hard to maintain
compatibility of Hadoop RPC even for branch-2 and up, because the protocol is not mature enough
(see discussions in HADOOP-9151). The major concerns for Hadoop RPC are performance and quality
of service and we just started to address the latter (HADOOP-9194). It's highly probable that
the next version of Hadoop RPC will be incompatible with the 2.0.x releases. Separation of
concerns is a hallmark of good software design. While the performance/compatibility trade-off
needs to be made for application protocols, it's unnecessary for management protocols.

Compatibility scores: JMX: 2; custom REST: 1; custom RPC: 0.


*Isolation* is a somewhat unique requirement for management protocols. Seasoned ops will not
setup a datacenter without serial console (and/or IPMI) access to every machine, as we cannot
trust managed software (even OSes) to be perfect.  It's common to see high RPC latency (up
to minutes) on busy hadoop daemons (especially on JT and occasionally NN). It's very desirable
for the managed services to respond to resource management requests quickly (<=1s latency)
in spite of the user load. On daemons with a http server already serving user load, we need
to start a separate http server for the custom REST API for good isolation, due to the potential
thread pool exhaustion for user load (shuffle, webhdfs etc.). JMX MBeanServer is built in
every java process and can respond quickly to admin requests even if user load are otherwise
high, as long as the node is not swapping. We can also have a special Hadoop RPC server dedicated
to the custom RPC, which is better than sharing the RPC server with user load. Though it still
has lower isolation, in case of the bugs in RPC code.

Isolation scores: JMX: 2 if on RMI or separate http server with Jolokia, 1 otherwise; custom
REST: 2 if separate server, 1 otherwise; custom RPC: 1 if separate server, 0 otherwise.

*Security* is important for obvious reasons. JMX security setup (SSL with password or client
cert) is well documented and trivial if the client is inside the firewall (otherwise RMI complicates
the setup), which is the common case.  The setup is uniform across all components supporting
JMX. For most enterpise clusters, JMX security is a sunk cost. Further more, Jolokia enables
hadoop servlet auth filter plugin to support various authentication schemes including SPNEGO/Kerberos.
People can enable Jolokia and disable JMX remote for convenience (on Hadoop only clusters)
or enable both for even better isolation guarantees. The choices should be our users (sysadmins).
JMX supports arbitrary role based authentication and authorization. Jolokia also support fine
grained authorization configuration to restrict access at MBean granularities. These functionalities
don't exist yet for either the custom REST or custom RPC approaches.

Security scores: JMX: 2; custom REST: 1; custom RPC: 1;

*Ease of implementation* is directly linked to the maintenance cost. I've already demonstrated
ease of implementation of JMX services with much better tooling (including the ubiquitous
jconsole for dev/debug) than custom REST or custom RPC, with the latter having the most verbose/tedious
implementation with code generations. A custom REST API or RPC solution will require major
effort to even begin to match the functionality offered by JMX/Jolokia _now_, especially considering
the client support and documentation. Less code is required to enable both JMX and REST/JSON
API via Jolokia with much better documentation and support besides the Hadoop community than
either custom REST or custom RPC approaches.

Ease of implementation scores: JMX: 2; custom REST: 1; custom RPC: 0

Based on the above analysis, the best management protocol for Hadoop going forward is JMX/Jolokia,
which can also vastly improve the manageability of Hadoop in stable branches without breaking
compatibility.

The demo patch shows that the code to optionally enable such great improvement of manageability
is trivial.

                
> Adopt JMX for management protocols
> ----------------------------------
>
>                 Key: HADOOP-9160
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9160
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Luke Lu
>
> Currently we use Hadoop RPC (and some HTTP, notably fsck) for admin protocols. We should
consider adopt JMX for future admin protocols, as it's the industry standard for java server
management with wide client support.
> Having an alternative/redundant RPC mechanism is very desirable for admin protocols.
I've seen in the past in multiple cases, where NN and/or JT RPC were locked up solid due to
various bugs and/or RPC thread pool exhaustion, while HTTP and/or JMX worked just fine.
> Other desirable benefits include admin protocol backward compatibility and introspectability,
which is convenient for a centralized management system to manage multiple Hadoop clusters
of different versions. Another notable benefit is that it's much easier to implement new admin
commands in JMX (especially with MXBean) than Hadoop RPC, especially in trunk (as well as
0.23+ and 2.x).
> Since Hadoop RPC doesn't guarantee backward compatibility (probably not ever for branch-1),
there are few external tools depending on it. We can keep the old protocols for as long as
needed. New commands should be in JMX. The transition can be gradual and backward-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message