hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maysam Yabandeh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6982) nntop: top­-like tool for name node users
Date Mon, 13 Oct 2014 19:11:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169779#comment-14169779
] 

Maysam Yabandeh commented on HDFS-6982:
---------------------------------------

Thanks [~andrew.wang] for the well-detailed review. I will submit a new patch soon. In the
meanwhile, let me double check a couple of points with you.

bq. Since I don't see any modifications to any existing files, I'm also wondering how this
is exposed to JMX or on the webUI.

You are right. I was not sure where is the best place to integrate nntop with nn. I will pick
a place and we can update it later.

bq. There's only a {{getDefaultRollingWindow}} class, no other ways of constructing a RollingWindow.

The design doc envisions two interfaces to access the top users. One is jmx that requires
rolling window over only one reporting period, say 1 minute. Jmx data however are most useful
when they are integrated with an external graphing tool. To also allow users with small clusters
to benefit from the data computed by nntop, we also provide an html interface, which has no
graphing capability. This basic interface unfortunately does not give a sense of *trend* to
the viewer. To compensate for that, the html page will show the top users over multiple time
periods, say 1, 5, 25 minutes; ergo why we have multiple rolling window periods in nntop.
One of them however is used for jmx interface, which is specific by {{getDefaultRollingWindow}}.

About the html interface, I excluded it from this patch for two reasons. First, i figured
it is better to keep this patch as small as possible and work on the html interface patch
on a separate jira. Second reason was that previously I had used yarn html utils and I am
gonna have to rewrite that part using html utils which are standard to the hdfs project.

bq. How do we configure multiple reporting periods?

via some conf params. I will make sure that the docs reflect that properly.

bq. WEB_PORT and DEFAULT_WEB_PORT seem to be unused

you right. they are supposed to be used by the html interface. but I should remove them from
this patch.

bq. getCmdTotal and getTopMetricsRecordPrefix static getters are only used in TopMetrics,
that might be a better home.

they will later be used by the html interface as well. the html interface will show the total
operations on top and then details of each command afterwards. 

bq. Rather than MIN_2_MS, could we have a long array with the default periods, i.e. DEFAULT_REPORTING_PERIODS?

In addition to the previous explanation about multiple reporting periods for the html view,
I should add the them reporting periods are expected to be specified in the conf file. I dropped
the method that reads them from the conf file from the patch since it was invoked only via
the html interface. But I guess I should put it back to avoid confusion.

bq. report, we construct the permStr, but don't actually use it.

you are right. I actually can drop src, dst, and also status. At the beginning the vision
for nntop was to also report hot directories, etc. and that is why we kept the full details
in the report method. but i guess we can always put such details back if at some point those
visions were to pursued.

bq. report, I don't think we need the catch for Throwable t, no checked exceptions are being
thrown?

the idea was that any unexpected problem from a programming bug in nntop should not crash
the name node.

bq.  TopUtil: This stuff isn't shared much, seems like we could just move things to where
they're used

TopUtil was much fatter when it also included html view util functions. Also html view will
also be a user of TopUtil.

bq. TopMetricsCollector: Is this used?
 
yeah, by the html view. I should drop it from this patch.

> nntop: top­-like tool for name node users
> -----------------------------------------
>
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what top does
in Linux, gives the list of top users of the HDFS name node and gives insight about which
users are sending majority of each traffic type to the name node. This information turns out
to be the most critical when the name node is under pressure and the HDFS admin needs to know
which user is hammering the name node and with what kind of requests. Here we present the
design of nntop which has been in production at Twitter in the past 10 months. nntop proved
to have low cpu overhead (< 2% in a cluster of 4K nodes), low memory footprint (less than
a few MB), and quite efficient for the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message