hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maddineni Sukumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15219) Canary tool does not return non-zero exit code when one of regions is in stuck state
Date Fri, 12 Feb 2016 14:42:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144640#comment-15144640
] 

Maddineni Sukumar commented on HBASE-15219:
-------------------------------------------

I tried below simple scenario and tool is not giving error even though I see so many exceptions
and region read failed line in the logs

1. Create table with 15 regions and then take any one region offline by closing it using hbase
shell "close_region" command
2. Now execute scan on that table in shell and scan fails with error region offline, this
is expected
3. Now execute Canary and it shows so many NotServingRegionException and also below log line

          "ERROR [pool-2-thread-1] tool.Canary - read from region t1,11111111,1455282821587.56c76e5f8afa7b4b805b1063a5f2e115.
column family f1 failed"

    But still Canary returns zero as return code. 

[~ted_yu]  

> Canary tool does not return non-zero exit code when one of regions is in stuck state

> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-15219
>                 URL: https://issues.apache.org/jira/browse/HBASE-15219
>             Project: HBase
>          Issue Type: Bug
>          Components: canary
>    Affects Versions: 0.98.16
>            Reporter: Vishal Khandelwal
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.2.1, 0.98.18
>
>         Attachments: HBASE-15219.v1.patch, HBASE-15219.v3.patch, HBASE-15219.v4.patch,
HBASE-15219.v5.patch, HBASE-15219.v7.patch
>
>
> {code}
> 2016-02-05 12:24:18,571 ERROR [pool-2-thread-7] tool.Canary - read from region CAN_1,\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1454667477865.00e77d07b8defe10704417fb99aa0418.
column family 0 failed
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
> Fri Feb 05 12:24:15 GMT 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller@54c9fea0,
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
Region CAN_1,\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1454667477865.00e77d07b8defe10704417fb99aa0418.
is not online on isthbase02-dnds1-3-crd.eng.sfdc.net,60020,1454669984738
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2852)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4468)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2984)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31186)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2149)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> 	at java.lang.Thread.run(Thread.java:745)
> --------
> -bash-4.1$ echo $?
> 0
> {code}
> Below code prints the error but it does sets/returns the exit code. Due to this tool
can't be integrated with nagios or other alerting. 
> Ideally it should return error for failures. as pre the documentation:
> <snip>
> This tool will return non zero error codes to user for collaborating with other monitoring
tools, such as Nagios. The error code definitions are:
> private static final int USAGE_EXIT_CODE = 1;
> private static final int INIT_ERROR_EXIT_CODE = 2;
> private static final int TIMEOUT_ERROR_EXIT_CODE = 3;
> private static final int ERROR_EXIT_CODE = 4;
> </snip>
> {code}
> org.apache.hadoop.hbase.tool.Canary.RegionTask 
> public Void read() {
>       ....
>       try {
>         table = connection.getTable(region.getTable());
>         tableDesc = table.getTableDescriptor();
>       } catch (IOException e) {
>         LOG.debug("sniffRegion failed", e);
>         sink.publishReadFailure(region, e);
>        ...
>         return null;
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message