Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 28 Dec 2012 09:58:13 +0000 (UTC)
From: "terry zhang (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12625410.1356688147428.58546.1356688693220@arcas>
In-Reply-To: <JIRA.12625410.1356688147428@arcas>
References: <JIRA.12625410.1356688147428@arcas>
Subject: [jira] [Commented] (HBASE-7451) [snapshot] regionserver will be
 deadlock when GlobalSnapshotOperation timeout happen
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-7451?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1354=
0418#comment-13540418 ]=20

terry zhang commented on HBASE-7451:
------------------------------------

My solution is simple, Just notify snapshotErrorListener to threw a excepti=
on to let all GloballyConsistentRegionLockTask  to quit and release the reg=
ion lock.

{code:title=3DSnapshotOperation.java|borderStyle=3Dsolid}
  @Override
  public void cleanup(Exception e) {
    LOG.debug("Cleanup snapshot - handled in sub-tasks on error");
  +  snapshotErrorListener.receiveError("Cleanup snapshot", new HBaseSnapsh=
otException(e.getMessage()));
  }
{code}=20
               =20
> [snapshot] regionserver will be deadlock when GlobalSnapshotOperation tim=
eout happen
> -------------------------------------------------------------------------=
-----------
>
>                 Key: HBASE-7451
>                 URL: https://issues.apache.org/jira/browse/HBASE-7451
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: terry zhang
>            Assignee: terry zhang
>
> Hi Matteo Bertozzi and Jesse Yates, My observation is base on code in git=
hub =EF=BC=9A https://github.com/matteobertozzi/hbase/
> If we create a snapshot and meet regionserver timeout. Rs will be lock an=
d can not put any data. Please take a look at log below :
> // regionserver snapshot timeout
> org.apache.hadoop.hbase.server.commit.distributed.DistributedCommitExcept=
ion: org.apache.hadoop.hbase.server.errorhandling.exception.OperationAttemp=
tTimeoutException: Timeout elapsed! Start:1356518666984, End:1356518667584,=
 diff:600, max:600 ms
> at org.apache.hadoop.hbase.server.commit.distributed.DistributedThreePhas=
eCommitErrorDispatcher.wrap(DistributedThreePhaseCommitErrorDispatcher.java=
:135)
> at org.apache.hadoop.hbase.server.commit.distributed.DistributedThreePhas=
eCommitErrorDispatcher.operationTimeout(DistributedThreePhaseCommitErrorDis=
patcher.java:71)
> at org.apache.hadoop.hbase.server.commit.ThreePhaseCommit$1.receiveError(=
ThreePhaseCommit.java:92)
> at org.apache.hadoop.hbase.server.commit.ThreePhaseCommit$1.receiveError(=
ThreePhaseCommit.java:89)
> at org.apache.hadoop.hbase.server.errorhandling.OperationAttemptTimer$1.r=
un(OperationAttemptTimer.java:71)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> Caused by: org.apache.hadoop.hbase.server.errorhandling.exception.Operati=
onAttemptTimeoutException: Timeout elapsed! Start:1356518666984, End:135651=
8667584, diff:600, max:600 ms
> ... 3 more
> 2012-12-26 18:44:57,211 DEBUG org.apache.hadoop.hbase.server.commit.TwoPh=
aseCommit: Running cleanup phase.
> 2012-12-26 18:44:57,211 DEBUG org.apache.hadoop.hbase.regionserver.snapsh=
ot.operation.SnapshotOperation: Cleanup snapshot - handled in sub-tasks on =
error
> 2012-12-26 18:44:57,212 DEBUG org.apache.hadoop.hbase.serv
> //Waiting for 'commit allowed' latch and do not exist
> 2012-12-26 18:44:57,211 DEBUG org.apache.hadoop.hbase.server.commit.TwoPh=
aseCommit: Running cleanup phase.
> 2012-12-26 18:44:57,211 DEBUG org.apache.hadoop.hbase.regionserver.snapsh=
ot.operation.SnapshotOperation: Cleanup snapshot - handled in sub-tasks on =
error
> 2012-12-26 18:44:57,212 DEBUG org.apache.hadoop.hbase.server.commit.TwoPh=
aseCommit: Running finish phase.
> 2012-12-26 18:44:57,212 DEBUG org.apache.hadoop.hbase.regionserver.snapsh=
ot.operation.SnapshotOperation: Finish snapshot - handling in subtasks on e=
rror
> 2012-12-26 18:44:57,212 WARN org.apache.hadoop.hbase.server.errorhandling=
.OperationAttemptTimer: Timer already marked completed, ignoring!
> 2012-12-26 18:45:01,990 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:06,990 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:11,991 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:16,991 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:17,002 INFO org.apache.hadoop.hbase.server.commit.distri=
buted.zookeeper.ZKTwoPhaseCommitCohortMemberController: Received children c=
hanged event:/hbase-TERRY-73/online-snapshot/prepare
> 2012-12-26 18:45:17,002 INFO org.apache.hadoop.hbase.server.commit.distri=
buted.zookeeper.ZKTwoPhaseCommitCohortMemberController: Recieved start even=
t.
> 2012-12-26 18:45:17,002 DEBUG org.apache.hadoop.hbase.server.commit.distr=
ibuted.zookeeper.ZKTwoPhaseCommitCohortMemberController: Looking for new op=
erations under znode:/hbase-TERRY-73/online-snapshot/prepare
> 2012-12-26 18:45:17,003 INFO org.apache.hadoop.hbase.server.commit.distri=
buted.zookeeper.ZKTwoPhaseCommitCohortMemberController: Received children c=
hanged event:/hbase-TERRY-73/online-snapshot/abort
> 2012-12-26 18:45:17,003 INFO org.apache.hadoop.hbase.server.commit.distri=
buted.zookeeper.ZKTwoPhaseCommitCohortMemberController: Recieved abort even=
t.
> 2012-12-26 18:45:17,003 DEBUG org.apache.hadoop.hbase.server.commit.distr=
ibuted.zookeeper.ZKTwoPhaseCommitCohortMemberController: Checking for abort=
ed operations on node:/hbase-TERRY-73/online-snapshot/abort
> 2012-12-26 18:45:21,991 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:26,992 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:31,992 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:36,992 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:41,992 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:43,481 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCa=
che: LRU Stats: total=3D11.77 MB, free=3D1.39 GB, max=3D1.4 GB, blocks=3D5,=
 accesses=3D96, hits=3D91, hitRatio=3D94.79%, cachingAccesses=3D96, caching=
Hits=3D91, cachingHitsRatio=3D94.79%, evictions=3D0, evicted=3D0, evictedPe=
rRun=3DNaN
> 2012-12-26 18:45:46,993 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:51,993 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:45:56,993 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:01,994 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:06,994 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:11,994 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:16,994 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:21,995 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:26,995 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:31,995 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:36,996 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)
> 2012-12-26 18:46:41,996 DEBUG org.apache.hadoop.hbase.util.Threads: Waiti=
ng for 'commit allowed' latch. (sleep:5000 ms)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrato=
rs
For more information on JIRA, see: http://www.atlassian.com/software/jira