hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Sechrist (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3686) Scanner timeout on RegionServer but Client won't know what happened
Date Fri, 25 Mar 2011 14:03:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011203#comment-13011203

Sean Sechrist commented on HBASE-3686:

I did a little more testing and it turns out this problem isn't limited to the misconfiguration.

You'll also lose rows if you kill -9 a region server in the middle of scan. In HTable.ClientScanner.next(),
there's this skipFirst boolean that is supposed to skip the first row that was "already let
out on a previous invocation". But instead of just skipping the first row, getConnection().getRegionServerWithRetries(callable)
is called an extra time, which will skip [caching] rows.

So I think fixing it to only skip 1 row will also fixing the problem if there's a misconfiguration,
so sending the timeout to the server won't be needed.

> Scanner timeout on RegionServer but Client won't know what happened
> -------------------------------------------------------------------
>                 Key: HBASE-3686
>                 URL: https://issues.apache.org/jira/browse/HBASE-3686
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.89.20100924
>            Reporter: Sean Sechrist
>            Priority: Minor
> This can cause rows to be lost from a scan.
> See this thread where the issue was brought up: http://search-hadoop.com/m/xITBQ136xGJ1
> If hbase.regionserver.lease.period is higher on the client than the server we can get
this series of events: 
> 1. Client is scanning along happily, and does something slow.
> 2. Scanner times out on region server
> 3. Client calls HTable.ClientScanner.next()
> 4. The region server throws an UnknownScannerException
> 5. Client catches exception and sees that it's not longer then it's hbase.regionserver.lease.period
config, so it doesn't throw a ScannerTimeoutException. Instead, it treats it like a NSRE.
> Right now the workaround is to make sure the configs are consistent. 
> A possible fix would be to use whatever the region server's scanner timeout is, rather
than the local one.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message