hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-5469) Exposing Hadoop metrics via HTTP
Date Thu, 12 Mar 2009 01:37:50 GMT
Exposing Hadoop metrics via HTTP
--------------------------------

                 Key: HADOOP-5469
                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
             Project: Hadoop Core
          Issue Type: New Feature
          Components: metrics
            Reporter: Philip Zeyliger


I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any
Hadoop daemon that has an HttpServer.  My motivation is pretty simple--if you're running on
a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this
would be a useful debugging utility.  I'd also like the output to be parseable, so I could
write a quick web app to query the metrics dynamically.

This is similar in spirit, but different, from just using JMX.  (See also HADOOP-4756.)  JMX
requires a client, and, more annoyingly, JMX requires setting up authentication.  If you just
disable authentication, someone can do Bad Things, and if you enable it, you have to worry
about yet another password. It's also more complete--JMX require separate instrumentation,
so, for example, the JobTracker's metrics aren't exposed via JMX.

To start the discussion going, I've attached a patch.  I had to add a method to ContextFactory
to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits
from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose
copy methods for the maps in OutputRecord, and implemented an easy servlet.  I ended up removing
some
common code from all MetricsContexts, for setting the period; I'm open to taking that out
if it muddies the patch significantly.

I'd love to hear your suggestions.  There's a bug in the JSON representation, and there's
some gross type-handling.

The patch is missing tests.  I wanted to post to gather feedback before I got too far, but
tests are forthcoming.

Here's a sample output for a job tracker, while it was running a "pi" job:

{noformat}
jvm
  metrics
    {hostName=doorstop.local, processName=JobTracker, sessionId=}
      gcCount=22
      gcTimeMillis=68
      logError=0
      logFatal=0
      logInfo=52
      logWarn=0
      memHeapCommittedM=7.4375
      memHeapUsedM=4.2150116
      memNonHeapCommittedM=23.1875
      memNonHeapUsedM=18.438614
      threadsBlocked=0
      threadsNew=0
      threadsRunnable=7
      threadsTerminated=0
      threadsTimedWaiting=8
      threadsWaiting=15
mapred
  job
    {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=2.0
    {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=48.0
    {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=148.0
    {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local,
jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
    {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=236.0
    {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=64.0
    {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=1.0
    {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=4.0
    {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001,
jobName=test-mini-mr, sessionId=, user=philip}
      value=0.0
  jobtracker
    {hostName=doorstop.local, sessionId=}
      jobs_completed=0
      jobs_submitted=1
      maps_completed=2
      maps_launched=5
      reduces_completed=0
      reduces_launched=1
rpc
  metrics
    {hostName=doorstop.local, port=50030}
      NumOpenConnections=2
      RpcProcessingTime_avg_time=0
      RpcProcessingTime_num_ops=84
      RpcQueueTime_avg_time=1
      RpcQueueTime_num_ops=84
      callQueueLen=0
      getBuildVersion_avg_time=0
      getBuildVersion_num_ops=1
      getJobProfile_avg_time=0
      getJobProfile_num_ops=17
      getJobStatus_avg_time=0
      getJobStatus_num_ops=32
      getNewJobId_avg_time=0
      getNewJobId_num_ops=1
      getProtocolVersion_avg_time=0
      getProtocolVersion_num_ops=2
      getSystemDir_avg_time=0
      getSystemDir_num_ops=2
      getTaskCompletionEvents_avg_time=0
      getTaskCompletionEvents_num_ops=19
      heartbeat_avg_time=5
      heartbeat_num_ops=9
      submitJob_avg_time=0
      submitJob_num_ops=1
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message