hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6982) nntop: top­-like tool for name node users
Date Thu, 13 Nov 2014 00:42:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208995#comment-14208995

Haohui Mai commented on HDFS-6982:

bq. However, my understanding is that there's no direct link between the alpha parameter and
a time-based window, e.g. 1mi, 5 min, 30min.

Let n equals to the number of observations per window. Setting {{alpha = (n-1) / n}} would
make the math right assuming that the number of requests follows Poisson distribution.

bq. IIUC the situation you describe will lead to small errors, not big ones. If there are
bigger correctness issues, I think we can fix them by adding more synchronization. Thanks.

Depending on the timing, the errors will lead to one of the following: (1) correct results,
(2) consistently missing one measurement from some users, (3) inconsistent measurement for
the same users. The artificial errors makes nntop less valuable.

I don't quite understand your concerns on fixing the issue. This is a variant of the online
counting problem which is relatively well-studied. Applying the de facto solution can eliminate
the errors and makes the implementation simpler. I'm not sure why we need to reinvent the
wheel here.

> nntop: top­-like tool for name node users
> -----------------------------------------
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, HDFS-6982.v4.patch,
HDFS-6982.v5.patch, HDFS-6982.v6.patch, nntop-design-v1.pdf
> In this jira we motivate the need for nntop, a tool that, similarly to what top does
in Linux, gives the list of top users of the HDFS name node and gives insight about which
users are sending majority of each traffic type to the name node. This information turns out
to be the most critical when the name node is under pressure and the HDFS admin needs to know
which user is hammering the name node and with what kind of requests. Here we present the
design of nntop which has been in production at Twitter in the past 10 months. nntop proved
to have low cpu overhead (< 2% in a cluster of 4K nodes), low memory footprint (less than
a few MB), and quite efficient for the write path (only two hash lookup for updating a metric).

This message was sent by Atlassian JIRA

View raw message