hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Sechrist (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3686) Scanner timeout on RegionServer but Client won't know what happened
Date Fri, 25 Mar 2011 14:03:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011203#comment-13011203
] 

Sean Sechrist commented on HBASE-3686:
--------------------------------------

I did a little more testing and it turns out this problem isn't limited to the misconfiguration.

You'll also lose rows if you kill -9 a region server in the middle of scan. In HTable.ClientScanner.next(),
there's this skipFirst boolean that is supposed to skip the first row that was "already let
out on a previous invocation". But instead of just skipping the first row, getConnection().getRegionServerWithRetries(callable)
is called an extra time, which will skip [caching] rows.

So I think fixing it to only skip 1 row will also fixing the problem if there's a misconfiguration,
so sending the timeout to the server won't be needed.

> Scanner timeout on RegionServer but Client won't know what happened
> -------------------------------------------------------------------
>
>                 Key: HBASE-3686
>                 URL: https://issues.apache.org/jira/browse/HBASE-3686
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.89.20100924
>            Reporter: Sean Sechrist
>            Priority: Minor
>
> This can cause rows to be lost from a scan.
> See this thread where the issue was brought up: http://search-hadoop.com/m/xITBQ136xGJ1
> If hbase.regionserver.lease.period is higher on the client than the server we can get
this series of events: 
> 1. Client is scanning along happily, and does something slow.
> 2. Scanner times out on region server
> 3. Client calls HTable.ClientScanner.next()
> 4. The region server throws an UnknownScannerException
> 5. Client catches exception and sees that it's not longer then it's hbase.regionserver.lease.period
config, so it doesn't throw a ScannerTimeoutException. Instead, it treats it like a NSRE.
> Right now the workaround is to make sure the configs are consistent. 
> A possible fix would be to use whatever the region server's scanner timeout is, rather
than the local one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message