hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8310) HBase snapshot timeout default values and TableLockManger timeout
Date Mon, 20 May 2013 03:59:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661760#comment-13661760
] 

Matteo Bertozzi commented on HBASE-8310:
----------------------------------------

I haven't followed the table lock I think that [~jmhsieh] has more context

{quote}
The user issues a sync snapshot request, waits for 1 min, and gets an exception.
In the meantime the snapshot handler is blocked on the table lock, and the snapshot may continue
to finish after 10 mins.
But the user will probably re-issue the snapshot request during the 10 mins.
{quote}
In this case the snapshot is still in progress and a next request will return "Rejected taking
snapshot because we are already running another snapshot" from the snapshotManager and you
should wait... no table lock is involved on the exception... but the table lock is lost because
has a timeout? so... the table lock logic seems wrong here.. 
so my vote here is tending to the minus, but I have not enough context on the implementation
of the snapshots and table lock.
                
> HBase snapshot timeout default values and TableLockManger timeout
> -----------------------------------------------------------------
>
>                 Key: HBASE-8310
>                 URL: https://issues.apache.org/jira/browse/HBASE-8310
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.95.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 0.98.0, 0.94.8, 0.95.2
>
>
> There are a few timeout values and defaults being used by HBase snapshot.
> DEFAULT_MAX_WAIT_TIME (60000 milli sec, 1 min) for client response
> TIMEOUT_MILLIS_DEFAULT (60000 milli sec, 1 min) for Procedure timeout
> SNAPSHOT_TIMEOUT_MILLIS_DEFAULT (60000 milli sec, 1 min) for region server subprocedure
 
> There is also other timeout involved, for example, 
> DEFAULT_TABLE_WRITE_LOCK_TIMEOUT_MS (10 mins) for TakeSnapshotHandler#prepare()
> We could have this case:
> The user issues a sync snapshot request, waits for 1 min, and gets an exception.
> In the meantime the snapshot handler is blocked on the table lock, and the snapshot may
continue to finish after 10 mins.
> But the user will probably re-issue the snapshot request during the 10 mins.
> This is a little confusing and messy when this happens.
> To be more reasonable, we should either increase the DEFAULT_MAX_WAIT_TIME or decrease
the table lock waiting time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message