hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5973) Add ability for potentially long-running IPC calls to abort if client disconnects
Date Wed, 09 May 2012 22:05:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271863#comment-13271863
] 

Todd Lipcon commented on HBASE-5973:
------------------------------------

Attached patch implements the suggested idea, and hooks it up for scanner.next().

I spent 2.5 hours trying to write a test case for it, but we have so many layers of byzantine
caching going on above the IPC sockets that I couldn't figure out how to make a client IPC
connection actually hard-disconnect. So I tested it from the shell. here's the manual test
plan:

1) create a table with 100 or so rows
2) issue following from shell:

{code}
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.95-SNAPSHOT, r5c65cc4a19fbc00876a365b10e98142238dc9a97, Wed May  9 13:06:25 PDT
2012

hbase(main):001:0> import org.apache.hadoop.hbase.filter.TestFilter
=> Java::OrgApacheHadoopHbaseFilter::TestFilter
hbase(main):002:0> scan 't1', { FILTER => TestFilter::SlowScanFilter.new(), CACHE =>
50 }
ROW                                            COLUMN+CELL                               
                                                                                         
   
{code}
(shell will hang here)

On the server side, you should see:
{code}

12/05/09 15:03:29 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main]
sleeping in filter...
12/05/09 15:03:30 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main]
sleeping in filter...
12/05/09 15:03:31 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main]
sleeping in filter...
12/05/09 15:03:32 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main]
sleeping in filter...
{code}

Now ^C the shell. You should see on the server:

{code}
12/05/09 15:03:33 ERROR regionserver.RegionServer: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call scan(null, scannerId:
4581116627867187291
numberOfRows: 50
closeScanner: false
), rpc version=1, client version=1, methodsFingerPrint=-944626147 from 127.0.0.1:55648 after
5009 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:417)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3433)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3391)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3415)
        at org.apache.hadoop.hbase.regionserver.RegionServer.scan(RegionServer.java:828)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:358)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1387)
12/05/09 15:03:33 WARN ipc.HBaseServer: IPC Server Responder, call scan(null, scannerId: 4581116627867187291
numberOfRows: 50
closeScanner: false
), rpc version=1, client version=1, methodsFingerPrint=-944626147 from 127.0.0.1:55648: output
error
12/05/09 15:03:33 WARN ipc.HBaseServer: IPC Server handler 0 on 58364 caught a ClosedChannelException,
this means that the server was processing a request but the client went away. The error message
was: null
{code}

We could probably improve the messaging slightly, but this is at least an improvement in that
the thread doesn't continue to get hung up indefinitely.
                
> Add ability for potentially long-running IPC calls to abort if client disconnects
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-5973
>                 URL: https://issues.apache.org/jira/browse/HBASE-5973
>             Project: HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hbase-5973.txt
>
>
> We recently had a cluster issue where a user was submitting scanners with a very restrictive
filter, and then calling next() with a high scanner caching value. The clients would generally
time out the next() call and disconnect, but the IPC kept running looking to fill the requested
number of rows. Since this was in the context of MR, the tasks making the calls would retry,
and the retries wuld be more likely to time out due to contention with the previous still-running
scanner next() call. Eventually, the system spiraled out of control.
> We should add a hook to the IPC system so that RPC calls can check if the client has
already disconnected. In such a case, the next() call could abort processing, given any further
work is wasted. I imagine coprocessor endpoints, etc, could make good use of this as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message