hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anirban Roy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-19681) Online snapshot creation failing with missing store file
Date Wed, 03 Jan 2018 05:04:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anirban Roy updated HBASE-19681:
--------------------------------
    Attachment:     (was: region-server-missing file-log.doc)

> Online snapshot creation failing with missing store file
> --------------------------------------------------------
>
>                 Key: HBASE-19681
>                 URL: https://issues.apache.org/jira/browse/HBASE-19681
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore, snapshots
>    Affects Versions: 1.3.0
>         Environment: Hadoop - 2.7.3
> HBase 1.3.0
> OS - GNU/Linux x86_64
> Cluster - Amazon Elastic Mapreduce
>            Reporter: Anirban Roy
>         Attachments: region-server-missing file-log.doc
>
>
> We are facing problem creating online snapshot of our HBase table. The table contains
20TB data and receiving ~10000 writes per second. The snapshot creating failing intermittently
with error that some hfile missing, see the detailed output below. Once we locate the region
server hosting the region and restart the region server, snapshot creation succeeds. It seems
the missing hfile removed due to minor compaction, but region server still holds the pointer
to the file.
> [hadoop@ip-10-0-12-164 ~]$ hbase shell
> HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Type "exit<RETURN>" to leave the HBase Shell
> Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017
>  
> hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’
>  
> ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=x_snapshot
table=x_table type=FLUSH } had an error.  Procedure x_snapshot { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254,
ip-10-0-0-32.ec2.internal,16020,1508372591059, ip-10-0-14-221.ec2.internal,16020,1508372580873,
ip-10-0-15-185.ec2.internal,16020,1508372588507, ip-10-0-9-43.ec2.internal,16020,1508372569107,
ip-10-0-10-62.ec2.internal,16020,1512885921693, ip-10-0-8-216.ec2.internal,16020,1508372584133,
ip-10-0-1-207.ec2.internal,16020,1508372580144, ip-10-0-0-173.ec2.internal,16020,1508372584969,
ip-10-0-4-79.ec2.internal,16020,1508372587161, ip-10-0-3-165.ec2.internal,16020,1508372593566,
ip-10-0-14-137.ec2.internal,16020,1508372583225, ip-10-0-6-33.ec2.internal,16020,1508372581587,
ip-10-0-15-199.ec2.internal,16020,1508372587478, ip-10-0-5-253.ec2.internal,16020,1508372581243,
ip-10-0-1-99.ec2.internal,16020,1508372609684] }
>         at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354)
>         at org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058)
>         at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via
ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
java.io.FileNotFoundException: File does not exist: hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
>         at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
>         at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315)
>         at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344)
>         ... 6 more
> Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: java.io.FileNotFoundException:
File does not exist: hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
>         at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347)
>         at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140)
>         at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160)
>         at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187)
>         at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>  
> Here is some help for this command:
> Take a snapshot of specified table. Examples:
>  
>   hbase> snapshot 'sourceTable', 'snapshotName'
>   hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => true}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message