hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
Date Fri, 13 Dec 2013 07:02:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847214#comment-13847214
] 

Enis Soztutar commented on HBASE-10136:
---------------------------------------

Matteo you are right in the analysis. The table lock is released before the regions are complete
with opening, because of how region reopening and the master handlers are implemented. The
most clear fix to this is to fix the master itself I think (HBASE-5487).

> Alter table conflicts with concurrent snapshot attempt on that table
> --------------------------------------------------------------------
>
>                 Key: HBASE-10136
>                 URL: https://issues.apache.org/jira/browse/HBASE-10136
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.96.0, 0.98.1, 0.99.0
>            Reporter: Aleksandr Shulman
>            Assignee: Matteo Bertozzi
>              Labels: online_schema_change
>
> Expected behavior:
> A user can issue a request for a snapshot of a table while that table is undergoing an
online schema change and expect that snapshot request to complete correctly. Also, the same
is true if a user issues a online schema change request while a snapshot attempt is ongoing.
> Observed behavior:
> Snapshot attempts time out when there is an ongoing online schema change because the
region is closed and opened during the snapshot. 
> As a side-note, I would expect that the attempt should fail quickly as opposed to timing
out. 
> Further, what I have seen is that subsequent attempts to snapshot the table fail because
of some state/cleanup issues. This is also concerning.
> Immediate error:
> {code}type=FLUSH }' is still in progress!
> 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 10000ms
while waiting for snapshot completion.
> 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status
of snapshot from master...
> 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891):
Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640
type=FLUSH } is done
> 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374):
Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is
still in progress!
> Snapshot failure occurred
> org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't
completed in expectedTime:60000 ms
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
> 	at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
> Likely root cause of error:
> {code}Exception in SnapshotSubprocedurePool
> java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException:
changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
is closing
> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> 	at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
> 	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
> 	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
> 	at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
> 	at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
is closing
> 	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
> 	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
> 	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	... 5 more{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message