hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6982) nntop: top­-like tool for name node users
Date Sat, 11 Oct 2014 00:22:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167851#comment-14167851

Andrew Wang commented on HDFS-6982:

Hi Maysam, thanks for sticking with this, and again apologies for the delay with reviewing
this. I especially appreciate the effort to javadoc everything, it helped a lot with reviewing

One thing that'd be helpful are some basic setup instructions so I can try it out. Since I
don't see any modifications to any existing files, I'm also wondering how this is exposed
to JMX or on the webUI.

Anyway, I did a light pass for mostly code-style and nitty stuff, hopefully I can do a more
high-level overview once the above is answered.

* Need to put {{<p/>}} tags in javadoc to get line breaks. Can also use {{@link}} to
refer to methods.
* Should use SLF4J logging in new classes, i.e. {{LoggerFactory.getLogger()}}, and use the
"{}" template to avoid string construction overhead for debug prints
* These new classes should probably all be annotated @InterfaceAudience.Private
* Bucket could be private rather than package-private
* Code style: if/for blocks should always have braces
* A few lines longer than 80 chars
* I generally prefer doing Precondition checks to manually throwing IllegalStateException,
since it's more concise.

* There's only a {{getDefaultRollingWindow}} class, no other ways of constructing a RollingWindow.
With this in mind, could we rename it to just {{getRollingWindow}}? Same for params like {{defaultWindowLen}}
and {{defaultBPW}}, which make me think of the configuration defaults.
* Would be good to do Precondition validation on the config params, e.g. greater than zero,
numBuckets less than windowLen, evenly divisible, etc.

* Could we put most of this into hdfs-site.xml, hdfs-default.xml, and DFSConfigKeys instead?
It'd also be good to have some light docs in hdfs-default.xml about how to configure this.
* If so, propose we use a prefix like "dfs.namenode.top."
* How do we configure multiple reporting periods? That TopMetrics constructor isn't called
right now. It'd also be good to unify getting and then validating all the configs in a single
place, like this class. See NNConf for an example of what I'm thinking of.
* Typo in config key: BUKCETS
* WEB_PORT and DEFAULT_WEB_PORT seem to be unused, TopConfiguration(Configuration) constructor
* getCmdTotal and getTopMetricsRecordPrefix static getters are only used in TopMetrics, that
might be a better home.
* Rather than MIN_2_MS, could we have a long array with the default periods, i.e. DEFAULT_REPORTING_PERIODS?
We already have a Configuration#getInts methods, could add a Configuration#getLongs method

* Typo: "TopMetic singlton"
* If you use SLF4J, you can oftentimes skip wrapping things in LOG.isDebugEnabled if you also
do the string templating
* getMetrics does the isDebugEnabled check, but inside does LOG.info
* report, we construct the permStr, but don't actually use it.
* report, I don't think we need the catch for {{Throwable t}}, no checked exceptions are being
* getReportingPeriods is unused

* This stuff isn't shared much, seems like we could just move things to where they're used

* Is this used?

* Let's do the init in a @BeforeClass block
* Rather than grabbing defaultWindowLen and defaultBPW directly, we should set them in the
config, pass that in when making the RWManager, then we can verify that in the test that they've
taken effect. Can then make these RWM variables private.

More tests:
* Could also test the JMX interface, e.g. in TestNameNodeMetrics

> nntop: top­-like tool for name node users
> -----------------------------------------
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
> In this jira we motivate the need for nntop, a tool that, similarly to what top does
in Linux, gives the list of top users of the HDFS name node and gives insight about which
users are sending majority of each traffic type to the name node. This information turns out
to be the most critical when the name node is under pressure and the HDFS admin needs to know
which user is hammering the name node and with what kind of requests. Here we present the
design of nntop which has been in production at Twitter in the past 10 months. nntop proved
to have low cpu overhead (< 2% in a cluster of 4K nodes), low memory footprint (less than
a few MB), and quite efficient for the write path (only two hash lookup for updating a metric).

This message was sent by Atlassian JIRA

View raw message