hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI
Date Mon, 30 Mar 2015 18:55:55 GMT

    [ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387203#comment-14387203
] 

Wangda Tan commented on YARN-2901:
----------------------------------

Hi Varun,
Thanks for updating patch, I took a look again, it's much efficient now, few comments for
Log4jWarningErrorAppeneder:
- It's better to use ReentrantReadWriteLock instead of synchronize lock, since the class will
be more concurrently/frequently read comparing to write.
- {{getElementsAndCounts}}, when it trying to get number of count for each message, it loops
elements in qualifyingTimes. So in extreme case, for message="x", there's one message in every
second, it will loops all 24 * 60 * 60 * 2 = 172800 items. Solution of this could be complex,
either we don't have to remember every second count for some time, such as hard define we
only remember #count for single message like past 1min, past 5min, past 30min, past 1h ...
to avoid this problem. Or introduce tree-like structure for example interval tree (http://en.wikipedia.org/wiki/Interval_tree)
to make query more efficient. I think remember limited number of time ranges for each message
should be enough. There seems no need to support cases like "give me error count of 2:30:01
am to 3:40:05 pm". If you think changes are manageable, you can do it in the patch, or you
can file a ticket to address in a separated patch.
- When map.size() > maxUniqueMessages, cleanup will be triggered, I suggest to make a buffer
that cleanup will not run too offen, such as when map.size() > maxUniqueMessages * 2?

And could you take a look at findbugs warning and failed tests?

Wangda

> Add errors and warning stats to RM, NM web UI
> ---------------------------------------------
>
>                 Key: YARN-2901
>                 URL: https://issues.apache.org/jira/browse/YARN-2901
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19
at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch
>
>
> It would be really useful to have statistics on the number of errors and warnings in
the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions
on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message