hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal Khandelwal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-15219) Canary tool does not return non-zero exit when one of region stuck state
Date Fri, 05 Feb 2016 12:33:39 GMT
Vishal Khandelwal created HBASE-15219:
-----------------------------------------

             Summary: Canary tool does not return non-zero exit when one of region stuck state

                 Key: HBASE-15219
                 URL: https://issues.apache.org/jira/browse/HBASE-15219
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.16
            Reporter: Vishal Khandelwal


2016-02-05 12:24:18,571 ERROR [pool-2-thread-7] tool.Canary - read from region CAN_1,\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1454667477865.00e77d07b8defe10704417fb99aa0418.
column family 0 failed
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Fri Feb 05 12:24:15 GMT 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller@54c9fea0, org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region CAN_1,\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1454667477865.00e77d07b8defe10704417fb99aa0418.
is not online on isthbase02-dnds1-3-crd.eng.sfdc.net,60020,1454669984738
	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2852)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4468)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2984)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31186)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2149)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
	at java.lang.Thread.run(Thread.java:745)

--------
-bash-4.1$ echo $?
0

Below code prints the error but it does sets/returns the exit code. Due to this tool can't
be integrated with nagios or other alerting. 

Ideally it should return error for failures. as pre the documentation:

<snip>
This tool will return non zero error codes to user for collaborating with other monitoring
tools, such as Nagios. The error code definitions are:

private static final int USAGE_EXIT_CODE = 1;
private static final int INIT_ERROR_EXIT_CODE = 2;
private static final int TIMEOUT_ERROR_EXIT_CODE = 3;
private static final int ERROR_EXIT_CODE = 4;

</snip>

org.apache.hadoop.hbase.tool.Canary.RegionTask 
public Void read() {
      ....
      try {
        table = connection.getTable(region.getTable());
        tableDesc = table.getTableDescriptor();
      } catch (IOException e) {
        LOG.debug("sniffRegion failed", e);
        sink.publishReadFailure(region, e);
       ...
        return null;
      }




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message