hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4594) Monitoring Scripts for Nagios
Date Sat, 08 Nov 2008 18:02:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646010#action_12646010

Edward Capriolo commented on HADOOP-4594:

I have read up on Chukwa and HADOOP-3628, I think ping() would work with nagios well. My goal
was to provided something that works today. I agree that jps and grep is not a great way to
monitor anything, but I also believe in the 80 / 20 rule. Checking the reply of each Component
web interface is a step better.

I was thinking checks like this might be meaningful useful without being complicated.

NumberOfDeadNodes > X -- This alarm would go off if the number of dead nodes in the cluster
goes higher then X

PercentageOfDeadNodes > X -- This would alarm if the % of dead nodes goes higher then X

WriteFileReadFile (String hdfspath ) -- This would attempt to read and write a file. 

ReadFile (String hdfspath) -- would attempt to read a file

TotalFreeDFSPrecent < X -- Would alarm when the DFS spaces falls below a certain value.

These are some things that someone in an administrative role would want. 

> Monitoring Scripts for Nagios
> -----------------------------
>                 Key: HADOOP-4594
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4594
>             Project: Hadoop Core
>          Issue Type: Wish
>            Reporter: Edward Capriolo
>            Priority: Minor
>         Attachments: HADOOP-4594.patch
> I would like to create a set of local via NRPE and remote check scripts that can be shipped
with the hadoop distribution and used to monitor Hadoop. I already have completed the NRPE
scripts. The second set of scripts would use wget to read the output of the hadoop web interfaces.
Do these already exist?
> I guess these would fall under a new contrib project.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message