Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E488017B4B for ; Sat, 11 Oct 2014 00:22:37 +0000 (UTC) Received: (qmail 96778 invoked by uid 500); 11 Oct 2014 00:22:37 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 96720 invoked by uid 500); 11 Oct 2014 00:22:37 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 96707 invoked by uid 99); 11 Oct 2014 00:22:37 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Oct 2014 00:22:37 +0000 Date: Sat, 11 Oct 2014 00:22:37 +0000 (UTC) From: "Andrew Wang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: =?utf-8?Q?[jira]_[Commented]_(HDFS-6982)_nntop?= =?utf-8?Q?:_top=C2=AD-like_tool_for_name_node_users?= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6982?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14167= 851#comment-14167851 ]=20 Andrew Wang commented on HDFS-6982: ----------------------------------- Hi Maysam, thanks for sticking with this, and again apologies for the delay= with reviewing this. I especially appreciate the effort to javadoc everyth= ing, it helped a lot with reviewing this. One thing that'd be helpful are some basic setup instructions so I can try = it out. Since I don't see any modifications to any existing files, I'm also= wondering how this is exposed to JMX or on the webUI. Anyway, I did a light pass for mostly code-style and nitty stuff, hopefully= I can do a more high-level overview once the above is answered. Nits: * Need to put {{

}} tags in javadoc to get line breaks. Can also use {{@= link}} to refer to methods. * Should use SLF4J logging in new classes, i.e. {{LoggerFactory.getLogger()= }}, and use the "{}" template to avoid string construction overhead for deb= ug prints * These new classes should probably all be annotated @InterfaceAudience.Pri= vate * Bucket could be private rather than package-private * Code style: if/for blocks should always have braces * A few lines longer than 80 chars * I generally prefer doing Precondition checks to manually throwing Illegal= StateException, since it's more concise. RollingWindowManager: * There's only a {{getDefaultRollingWindow}} class, no other ways of constr= ucting a RollingWindow. With this in mind, could we rename it to just {{get= RollingWindow}}? Same for params like {{defaultWindowLen}} and {{defaultBPW= }}, which make me think of the configuration defaults. * Would be good to do Precondition validation on the config params, e.g. gr= eater than zero, numBuckets less than windowLen, evenly divisible, etc. TopConfiguration: * Could we put most of this into hdfs-site.xml, hdfs-default.xml, and DFSCo= nfigKeys instead? It'd also be good to have some light docs in hdfs-default= .xml about how to configure this. * If so, propose we use a prefix like "dfs.namenode.top." * How do we configure multiple reporting periods? That TopMetrics construct= or isn't called right now. It'd also be good to unify getting and then vali= dating all the configs in a single place, like this class. See NNConf for a= n example of what I'm thinking of. * Typo in config key: BUKCETS * WEB_PORT and DEFAULT_WEB_PORT seem to be unused, TopConfiguration(Configu= ration) constructor too * getCmdTotal and getTopMetricsRecordPrefix static getters are only used in= TopMetrics, that might be a better home. * Rather than MIN_2_MS, could we have a long array with the default periods= , i.e. DEFAULT_REPORTING_PERIODS? We already have a Configuration#getInts m= ethods, could add a Configuration#getLongs method too. TopMetrics: * Typo: "TopMetic singlton" * If you use SLF4J, you can oftentimes skip wrapping things in LOG.isDebugE= nabled if you also do the string templating * getMetrics does the isDebugEnabled check, but inside does LOG.info * report, we construct the permStr, but don't actually use it. * report, I don't think we need the catch for {{Throwable t}}, no checked e= xceptions are being thrown? * getReportingPeriods is unused TopUtil * This stuff isn't shared much, seems like we could just move things to whe= re they're used TopMetricsCollector * Is this used? TestRWM: * Let's do the init in a @BeforeClass block * Rather than grabbing defaultWindowLen and defaultBPW directly, we should = set them in the config, pass that in when making the RWManager, then we can= verify that in the test that they've taken effect. Can then make these RWM= variables private. More tests: * Could also test the JMX interface, e.g. in TestNameNodeMetrics > nntop: top=C2=AD-like tool for name node users > ----------------------------------------- > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Maysam Yabandeh > Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1= .pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to wh= at top does in Linux, gives the list of top users of the HDFS name node and= gives insight about which users are sending majority of each traffic type = to the name node. This information turns out to be the most critical when t= he name node is under pressure and the HDFS admin needs to know which user = is hammering the name node and with what kind of requests. Here we present = the design of nntop which has been in production at Twitter in the past 10 = months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K node= s), low memory footprint (less than a few MB), and quite efficient for the = write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)