Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4FD5B10B16 for ; Fri, 13 Dec 2013 17:07:20 +0000 (UTC) Received: (qmail 85407 invoked by uid 500); 13 Dec 2013 17:07:11 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 85240 invoked by uid 500); 13 Dec 2013 17:07:09 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 85203 invoked by uid 99); 13 Dec 2013 17:07:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Dec 2013 17:07:08 +0000 Date: Fri, 13 Dec 2013 17:07:08 +0000 (UTC) From: "Jonathan Hsieh (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847658#comment-13847658 ] Jonathan Hsieh commented on HBASE-10136: ---------------------------------------- bq. The most clear fix to this is to fix the master itself I think (HBASE-5487). While I think this kind of "race" is something that new HBASE-5487 designs should handle, I disagree that that is the clearest way. I do think open/close/open cases can be handled within the current framework. > Alter table conflicts with concurrent snapshot attempt on that table > -------------------------------------------------------------------- > > Key: HBASE-10136 > URL: https://issues.apache.org/jira/browse/HBASE-10136 > Project: HBase > Issue Type: Bug > Components: snapshots > Affects Versions: 0.96.0, 0.98.1, 0.99.0 > Reporter: Aleksandr Shulman > Assignee: Matteo Bertozzi > Labels: online_schema_change > > Expected behavior: > A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. > Observed behavior: > Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. > As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. > Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. > Immediate error: > {code}type=FLUSH }' is still in progress! > 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 10000ms while waiting for snapshot completion. > 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... > 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done > 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! > Snapshot failure occurred > org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:60000 ms > at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) > at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) > at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) > at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} > Likely root cause of error: > {code}Exception in SnapshotSubprocedurePool > java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) > at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) > at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) > at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) > at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing > at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) > at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) > at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) > at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > ... 5 more{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)