hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "takeshi.miao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
Date Fri, 30 Aug 2013 07:27:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754458#comment-13754458

takeshi.miao commented on HBASE-7525:

Dear [~stack]

Here is the answer for your questions

 ./hbase-0.95.3-SNAPSHOT/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.tool.Canary
... it goes off and does something; default looks to go and get from all regions.
Yes, it's default behavior is just align with the old one, does the all regions monitoring

bq. You add 2013-08-29 09:32:16,463 DEBUG [main] tool.Canary: runCount=2. What does it mean
It is the internal DEBUG msg, for counting how many loop of this monitor instance did; It
can help user to observe the monitor instance's behavior whether as expected

Following are the questions you asked about _'-regionserver'_ option
Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver

Would it be clearer if the -regionserver option took arguments as in -regionserver=rs1,rs2,rs3
How to interpret this then:
Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver=rs1 table1
Would above only get regions from table1 on rs1? If no regions from table1 then it would print
out there are none?
The option _'-regionserver'_ (regionserver mode) is exclusive with the default mode (region
mode), which means user can only choose to use default mode or regionserver mode either

bq. I do not know how to read 'table/regionserver 1'. What is the '1'?
So it seems the usage output confuses the user, I would like to change it to following, how
do you think ?
Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table|regionserver [table|regionserver

 Or if you pass a table1 when you have a -regionserver option specified, you could just fail
with "Cannot pass a tablename when using the -regionserver option" – that'd probably be
Yes, this is a good suggestion, but currently I would not check this if the passed arguments
are whether tableNames in HBase, due to I need to new a HBaseAdim instance to get the table
list firstly, then compare them with the passed argument.
How do you think that I modify the usage output more precisely for -regionserver option ?
such as...
-regionserver  replace the table argument to regionserver,
      which means to enable regionserver mode, instead of region mode (default)
Either way is ok for me.

I will upload the new patches after we confirm which way to go, and tks for your questions
and suggestions :)

> A canary monitoring program specifically for regionserver
> ---------------------------------------------------------
>                 Key: HBASE-7525
>                 URL: https://issues.apache.org/jira/browse/HBASE-7525
>             Project: HBase
>          Issue Type: New Feature
>          Components: monitoring
>    Affects Versions: 0.94.0
>            Reporter: takeshi.miao
>            Priority: Critical
>             Fix For: 0.98.0
>         Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch,
HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch,
> *Motivation*
> This ticket is to provide a canary monitoring tool specifically for HRegionserver, details
as follows
> 1. This tool is required by operation team due to they thought that the canary for each
region of a HBase is too many for them, so I implemented this coarse-granular one based on
the original o.a.h.h.tool.Canary for them
> 2. And this tool is implemented by multi-threading, which means the each Get request
sent by a thread. the reason I use this way is due to we suffered the region server hung issue
by now the root cause is still not clear. so this tool can help operation team to detect hung
region server if any.
> *example*
> 1. the tool docs
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
> Usage: [opts] [regionServerName 1 [regionServrName 2...]]
>  regionServerName - FQDN serverName, can use linux command:hostname -f to check your
>  where [-opts] are:
>    -help Show this help and exit.
>    -e    Use regionServerName as regular expression
>       which means the regionServerName is regular expression pattern
>    -f <B>         stop whole program if first error occurs, default is true
>    -t <N>         timeout for a check, default is 600000 (milisecs)
>    -daemon        Continuous check at defined intervals.
>    -interval <N>  Interval between checks (sec)
> 2. Will send a request to each regionserver in a HBase cluster
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary
> 3. Will send a request to a regionserver by given name
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname
> 4. Will send a request to regionserver(s) by given regular-expression
> /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern
> // another example
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org
> 5. Will send a request to a regionserver and also set a timeout limit for this test
> // query regionserver:rs1.domainname with timeout limit 10sec
> // -f false, means that will not exit this program even test failed
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 10000 rs1.domainname
> // echo "1" if timeout
> echo "$?"
> 6. Will run as daemon mode, which means it will send request to each regionserver periodically
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message