hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-4366) dynamic metrics logging
Date Sat, 10 Sep 2011 00:42:08 GMT
dynamic metrics logging
-----------------------

                 Key: HBASE-4366
                 URL: https://issues.apache.org/jira/browse/HBASE-4366
             Project: HBase
          Issue Type: New Feature
          Components: metrics
            Reporter: Ming Ma
            Assignee: Ming Ma


First, if there is existing solution for this, I would close this jira. Also I realize we
already have various overlapping solutions; creating another solution isn't necessarily the
best approach. However, I couldn't find anything that can meet the need. So open this jira
for discussion.

We have some scenarios in hbase/mapreduce/hdfs that requires logging large number of dynamic
metrics. They can be used for troubleshooting, better measurement on the system and scorecard.
For example,
 
1.HBase. Get metrics such as request per sec that are specific to a table, or column family.
2.Mapreduce Job history analysis. Would like to found out all the job ids that are submitted,
completed, etc. in a specific time window.

For troubleshooting, what people usually do today, 1) Use current machine-level metrics to
find out which machine has the issue. 2) go to that machine, analysis the local log.



The characteristics of such kind of metrics:
 
1.It isn't something that can be predefined. The key such as table name, job id is dynamic.
2.The number of such metrics could be much larger than what the current metrics framework
can handle.
3.We don't have a scenario that require near real time query support, e.g., from the time
the metrics is generated to the time it is available to query can be at like an hour.
4.How data is consumed is highly application specific.

Some ideas:

1. Provide some interface for any application to log data.
2. The metrics can be written to log files. The log files or log entries will be loaded to
HBase, or HDFS asynchronously. That could go to a separate cluster.
3. To consume such data, application could run map reduce job on the log files for aggregation,
or do random read directly from HBase.


Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message