hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11282) Load balancer may move a region which is participating in snapshot
Date Tue, 03 Jun 2014 22:21:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017215#comment-14017215
] 

Ted Yu commented on HBASE-11282:
--------------------------------

Will come up with something after Hadoop Summit.

One approach is to let load balancer know the regions whose table is under table lock. For
the duration of active table lock, the regions of the table wouldn't be moved.

> Load balancer may move a region which is participating in snapshot
> ------------------------------------------------------------------
>
>                 Key: HBASE-11282
>                 URL: https://issues.apache.org/jira/browse/HBASE-11282
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> The region was tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.
> From master log:
> {code}
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Found
an existing plan for tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.       destination
server is h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal,60020,1394494963812 accepted
as a dest server = true
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Using
pre-existing plan for tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.;     plan=hri=tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.,
src=h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165, dest=h2-ubuntu12-sec-
    1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
> 2014-03-10 23:48:09,035 INFO  [AM.ZK.Worker-pool2-t42] master.RegionStates: Transitioned
{289ebdee6adf0a3b9c2bbcbe2ff522e7 state=CLOSED, ts=1394495289035, server=h2-       ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165}
to {289ebdee6adf0a3b9c2bbcbe2ff522e7 state=OFFLINE, ts=1394495289035, server=h2-ubuntu12-sec-
       1394425849-hbase-9.cs1cloud.internal,60020,1394494962165}
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] zookeeper.ZKAssign: master:60000-0x244aa9920190b04,
quorum=h2-ubuntu12-sec-1394425849-hbase-8.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-1.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal:2181,
baseZNode=/hbase Creating (or updating) unassigned     node 289ebdee6adf0a3b9c2bbcbe2ff522e7
with OFFLINE state
> 2014-03-10 23:48:09,044 INFO  [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Assigning
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. to h2-ubuntu12-sec-    1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
> {code}
> From hbase-hbase-regionserver-h2-ubuntu12-sec-1394425849-hbase-9.log :
> {code}
> 2014-03-10 23:48:08,487 WARN  [member: 'h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165'
subprocedure-pool1-thread-1] snapshot.                    RegionServerSnapshotManager: Got
Exception in SnapshotSubprocedurePool
> java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException:
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. is closing
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>   at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:325)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.NotServingRegionException: tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.
is closing
>   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5699)
>   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5663)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:65)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> {code}
> Load balancer's move of the underlying region caused FlushSnapshotSubprocedure to fail.
> Mechanism of making load balancer be aware of region operation is desirable such that
snapshot doesn't fail due to the above scenario.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message