hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8310) HBase snapshot timeout default values and TableLockManger timeout
Date Wed, 22 May 2013 05:03:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663779#comment-13663779

Jerry He commented on HBASE-8310:

Thanks for your comment. I totally agree with you.
Sorry I mis-communicated badly with the stmt you quoted.

Yes, the "reject snapshot" exception is thrown in prepareToTakeSnapshot() by checking the
the snapshotHandler. But in this particular case at the moment:
1) The current snapshot A is blocked on table lock in snapshotTable(). Its snapshotHandler
is not put into the map yet. 
2) The next snapshot B comes in and calls prepareToTakeSnapshot(). It will pass thru without
being rejected since there is no current snapshotHandler in the map yet.
3) snapshot B can not enter snapshotTable() since it is synchronized.
4) snapshot A can not leave snapshotTable() because it is blocked on table lock.
5) Effectively snapshot B is prevented from going in by the table lock block.

But you are right. I don't think the patch solves the problem, which is to let snapshot B
get "Reject snapshot" exception in prepareToTakeSnapshot().  

> HBase snapshot timeout default values and TableLockManger timeout
> -----------------------------------------------------------------
>                 Key: HBASE-8310
>                 URL: https://issues.apache.org/jira/browse/HBASE-8310
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.95.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 0.98.0, 0.95.2, 0.94.9
>         Attachments: trunk.patch
> There are a few timeout values and defaults being used by HBase snapshot.
> DEFAULT_MAX_WAIT_TIME (60000 milli sec, 1 min) for client response
> TIMEOUT_MILLIS_DEFAULT (60000 milli sec, 1 min) for Procedure timeout
> SNAPSHOT_TIMEOUT_MILLIS_DEFAULT (60000 milli sec, 1 min) for region server subprocedure
> There is also other timeout involved, for example, 
> DEFAULT_TABLE_WRITE_LOCK_TIMEOUT_MS (10 mins) for TakeSnapshotHandler#prepare()
> We could have this case:
> The user issues a sync snapshot request, waits for 1 min, and gets an exception.
> In the meantime the snapshot handler is blocked on the table lock, and the snapshot may
continue to finish after 10 mins.
> But the user will probably re-issue the snapshot request during the 10 mins.
> This is a little confusing and messy when this happens.
> To be more reasonable, we should either increase the DEFAULT_MAX_WAIT_TIME or decrease
the table lock waiting time.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message