hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Cunningham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11295) Long running scan produces OutOfOrderScannerNextException
Date Wed, 04 Jun 2014 20:39:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018135#comment-14018135
] 

Jeff Cunningham commented on HBASE-11295:
-----------------------------------------

Anoop, thanks for response.

Unfortunately, OutOfOrderScannerNextException that we get when nextCallSeq is different is
a DoNotRetryIOException, so no retry occurs.  The region server log has no record of this
event.  We only see this on the (user) client side.

Interestingly, the server's cached nextCallSeq is incremented before the server scan is executed.
 When client retries, their nextCallSeq values are already out of sync -- the server scan
is still executing and the client had no notification of timeout or error.  Should the server's
cached nextCallSeq value only be incremented after the scan comes back (or fails)?

If you look at the server log in the attached tarball, you will see the sequence for the one
client scan (notice the handler threadIDs):
15:15:54,824 (RpcServer.handler=94) - >>> Getting scanner for new request: 1940798815214593802
15:15:54,824 (RpcServer.handler=94) - >>>   1940798815214593802 nextCallSeq: 0
15:15:54,825 (RpcServer.handler=96) - >>> Getting scanner for ID: 1940798815214593802
15:15:54,825 (RpcServer.handler=96) - >>>   1940798815214593802 nextCallSeq: 0
15:15:54,825 (RpcServer.handler=96) - >>>     incrementing nexCallSeq for : 1940798815214593802
15:15:54,825 (RpcServer.handler=96) - WaitFilter snoozin for (150000) ms.
15:16:55,135 (RpcServer.handler=97) - >>> Getting scanner for ID: 1940798815214593802
15:16:55,135 (RpcServer.handler=97) - >>>   1940798815214593802 nextCallSeq: 1
15:16:55,142 (RpcServer.handler=92) - >>> Getting scanner for new request: 8946109289649235722
15:16:55,143 (RpcServer.handler=92) - >>>   8946109289649235722 nextCallSeq: 0
15:16:55,143 (RpcServer.handler=98) - >>> Getting scanner for ID: 8946109289649235722
15:16:55,143 (RpcServer.handler=98) - >>>   8946109289649235722 nextCallSeq: 0
15:16:55,143 (RpcServer.handler=98) - >>>     incrementing nexCallSeq for : 8946109289649235722
15:16:55,143 (RpcServer.handler=98) - WaitFilter snoozin for (150000) ms.
>>> Client sees OutOfOrderScannerException at this point after ~ 1 min. Nothing in
region server logs. <<<



> Long running scan produces OutOfOrderScannerNextException
> ---------------------------------------------------------
>
>                 Key: HBASE-11295
>                 URL: https://issues.apache.org/jira/browse/HBASE-11295
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.96.0
>            Reporter: Jeff Cunningham
>         Attachments: OutOfOrderScannerNextException.tar.gz
>
>
> Attached Files:
> HRegionServer.java - instramented from 0.96.1.1-cdh5.0.0
> HBaseLeaseTimeoutIT.java - reproducing JUnit 4 test
> WaitFilter.java - Scan filter (extends FilterBase) that overrides filterRowKey() to sleep
during invocation
> SpliceFilter.proto - Protobuf defintiion for WaitFilter.java
> OutOfOrderScann_InstramentedServer.log - instramented server log
> Steps.txt - this note
> Set up:
> In HBaseLeaseTimeoutIT, create a scan, set the given filter (which sleeps in overridden
filterRowKey() method) and set it on the scan, and scan the table.
> This is done in test client_0x0_server_150000x10().
> Here's what I'm seeing (see also attached log):
> A new request comes into server (ID 1940798815214593802 - RpcServer.handler=96) and a
RegionScanner is created for it, cached by ID, immediately looked up again and cached RegionScannerHolder's
nextCallSeq incremeted (now at 1).
> The RegionScan thread goes to sleep in WaitFilter#filterRowKey().
> A short (variable) period later, another request comes into the server (ID 8946109289649235722
- RpcServer.handler=98) and the same series of events happen to this request.
> At this point both RegionScanner threads are sleeping in WaitFilter.filterRowKey(). After
another period, the client retries another scan request which thinks its next_call_seq is
0.  However, HRegionServer's cached RegionScannerHolder thinks the matching RegionScanner's
nextCallSeq should be 1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message