hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10432) Rpc retries non-recoverable error
Date Tue, 25 Feb 2014 21:41:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912104#comment-13912104
] 

stack commented on HBASE-10432:
-------------------------------

I like your argument @nkeywal.

My take was that we are flipping to the other extreme here... where rather than just retrying
everything unless it is explicitly called out as not retryable, now instead we are only retry
the known retryables.  We want a reactive, fail-fast system.  Too long we've been off in the
murky world of retries and timeouts that came from the mapreduce/batch domain rather than
for those live serving; you've been doing a bunch of work elsewhere to help fix this.  I was
thinking this flip here would bubble up new types of failures that we could then add to the
retry set or ... we have a system that fails fast.

> Rpc retries non-recoverable error
> ---------------------------------
>
>                 Key: HBASE-10432
>                 URL: https://issues.apache.org/jira/browse/HBASE-10432
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.98.0, 0.96.2, 0.99.0
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>            Priority: Minor
>         Attachments: HBASE-10432.00.patch, HBASE-10432.01.patch, HBASE-10432.02.patch,
HBASE-10432.02.patch, exception.txt
>
>
> I'm recently working with hbase/trunk + hive/trunk. I had a hive command eventually timeout
with the following exception (stacktrace truncated).
> {noformat}
> Caused by: java.io.IOException: Could not set up IO Streams
>         at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:922)
>         at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1536)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1425)
>         at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>         at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:28857)
>         at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:302)
>         at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:157)
>         at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
>         ... 43 more
> Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Lorg/apache/hadoop/net/SocketInputWrapper;
>         at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:861)
>         ... 52 more
> {noformat}
> The root cause looks like a dependency version missmatch (Hive compiled vs hadoop1, HBase
vs hadoop2). However, we still retry this exception, even though it'll never actually complete.
We should be more careful where we blindly catch Throwables.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message