hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11282) Load balancer may move a region which is participating in snapshot
Date Tue, 03 Jun 2014 19:03:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017002#comment-14017002
] 

Andrew Purtell commented on HBASE-11282:
----------------------------------------

[~tedyu], do you have a concrete proposal for addressing what you described in the description?

> Load balancer may move a region which is participating in snapshot
> ------------------------------------------------------------------
>
>                 Key: HBASE-11282
>                 URL: https://issues.apache.org/jira/browse/HBASE-11282
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> The region was tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.
> From master log:
> {code}
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Found
an existing plan for tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.       destination
server is h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal,60020,1394494963812 accepted
as a dest server = true
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Using
pre-existing plan for tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.;     plan=hri=tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.,
src=h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165, dest=h2-ubuntu12-sec-
    1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
> 2014-03-10 23:48:09,035 INFO  [AM.ZK.Worker-pool2-t42] master.RegionStates: Transitioned
{289ebdee6adf0a3b9c2bbcbe2ff522e7 state=CLOSED, ts=1394495289035, server=h2-       ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165}
to {289ebdee6adf0a3b9c2bbcbe2ff522e7 state=OFFLINE, ts=1394495289035, server=h2-ubuntu12-sec-
       1394425849-hbase-9.cs1cloud.internal,60020,1394494962165}
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] zookeeper.ZKAssign: master:60000-0x244aa9920190b04,
quorum=h2-ubuntu12-sec-1394425849-hbase-8.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-1.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal:2181,
baseZNode=/hbase Creating (or updating) unassigned     node 289ebdee6adf0a3b9c2bbcbe2ff522e7
with OFFLINE state
> 2014-03-10 23:48:09,044 INFO  [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Assigning
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. to h2-ubuntu12-sec-    1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
> {code}
> From hbase-hbase-regionserver-h2-ubuntu12-sec-1394425849-hbase-9.log :
> {code}
> 2014-03-10 23:48:08,487 WARN  [member: 'h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165'
subprocedure-pool1-thread-1] snapshot.                    RegionServerSnapshotManager: Got
Exception in SnapshotSubprocedurePool
> java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException:
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. is closing
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>   at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:325)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.NotServingRegionException: tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.
is closing
>   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5699)
>   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5663)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:65)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> {code}
> Load balancer's move of the underlying region caused FlushSnapshotSubprocedure to fail.
> Mechanism of making load balancer be aware of region operation is desirable such that
snapshot doesn't fail due to the above scenario.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message