Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB819102A3 for ; Tue, 24 Mar 2015 01:13:54 +0000 (UTC) Received: (qmail 58557 invoked by uid 500); 24 Mar 2015 01:13:54 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 58522 invoked by uid 500); 24 Mar 2015 01:13:54 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 58510 invoked by uid 99); 24 Mar 2015 01:13:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Mar 2015 01:13:54 +0000 Date: Tue, 24 Mar 2015 01:13:54 +0000 (UTC) From: "Wangda Tan (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377040#comment-14377040 ] Wangda Tan commented on YARN-2901: ---------------------------------- Hi [~vvasudev], I spent some time take a look at Log4JMetricsAppeneder implementation (will include other modified component in next round). 1) Log4jMetricsAppender, 1.1 Better to place in yarn-server-common? 1.2 If you agree above, how about put into package o.a.h.y.server.metrics (or utils)? 1.3 Rename it to Log4jWarnErrorMetricsAppender? 1.4 Comments about implementation: I think currently, implementation of cleanup can be improved, now cutoff process of message/count is basically loop all items stored, which could be inefficient (imaging if number of stored message > threshold), existing logics in the patch would lead to lots of potential stored message (tons of messages could be genereated in 5 min, which is purge message task run interval). If you can make the data structure to be: SortedMap> errors (and warnings), the outside map is sorted by value (SortedMap with smallest timestamp goes first), and inside map is sorted by key (smallest timestamp goes first), purge can happen when we add any event, it will just take at most log(N=500) time to do the purge, and no extra timer task needed. To make SortedMap can sort by value, one way to do that can refer to http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java (first answer). Here, value = SortedMap>, we can sort the SortedMaps according to smallest key in each SortedMap. And one corner case may need to consider is, it is possible a same message can have lots of different timestamps, so we need purge the inner SortedMap too. To make better code readability, you can wrap the SortedMap to a inner class like MessageInfo. > Add errors and warning stats to RM, NM web UI > --------------------------------------------- > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager > Reporter: Varun Vasudev > Assignee: Varun Vasudev > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch > > > It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)