hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tianying Chang <tych...@gmail.com>
Subject Re: Failed to take snapshot due to some region directory is not found
Date Tue, 19 May 2015 18:35:48 GMT
Actually, I find it does not even print out the debug info below for this
table, other table will print out this logging. So it seems it did not
invoke the FlushSnapshotSubprocedure at all.


 @Override
    public Void call() throws Exception {
      // Taking the region read lock prevents the individual region from
being closed while a
      // snapshot is in progress.  This is helpful but not sufficient for
preventing races with
      // snapshots that involve multiple regions and regionservers.  It is
still possible to have
      // an interleaving such that globally regions are missing, so we
still need the verification
      // step.
      LOG.debug("Starting region operation on " + region);

On Tue, May 19, 2015 at 11:26 AM, Tianying Chang <tychang@gmail.com> wrote:

> Hi, Esteban,
>
> There is no region split in this cluster, since we put the region size
> upper bound to be really high to prevent splitting.
>
> I think it happens for all the regions of this table.
>
> I repeatedly run "hdfs dfs -lsr
> /hbase/.hbase-snapshot/ss_rich_pin_data_v1"  while taking snapshot, no
> region was able to write into this direction. I also turn on DEBUG logging
> on RS, all RS  just report fail with Timeout, with no specific reason.
>
> Thanks
> Tian-Ying
>
> On Tue, May 19, 2015 at 11:06 AM, Esteban Gutierrez <esteban@cloudera.com>
> wrote:
>
>> Hi Tianying,
>>
>> Is this happening consistently in this region or is it happening randomly
>> across other regions too? One possibility is that there was a split going
>> on at the time you started to take the snapshot and it failed. If you look
>> into /hbase/rich_pin_data_v1 can you find a directory named
>> dff681880bb2b23d0351d6656a1dbbb9 in there?
>>
>> cheers,
>> esteban.
>>
>>
>> --
>> Cloudera, Inc.
>>
>>
>> On Mon, May 18, 2015 at 11:12 PM, Tianying Chang <tychang@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > We have a cluster that used to be able to take snapshot. But recently,
>> one
>> > table failed due to the error below. Other tables on the same clusters
>> are
>> > fine.
>> >
>> > Any idea what could go wrong? Is the table not healthy? But I run hbase
>> > hbck, it reports cluster healthy.
>> >
>> > BTW, we are running 94.7, so we need to take snapshot of the data to
>> export
>> > to a new cluster of 94.26 as part of upgrade (and eventually upgrade to
>> > 1.x)
>> >
>> > Thanks
>> > Tian-Ying
>> >
>> >
>> > 015-05-19 06:00:45,505 ERROR
>> > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed
>> taking
>> > snapshot { ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH
>> }
>> > due to exception:No region directory found for region:{NAME =>
>> > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.',
>> > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9', ENCODED
=>
>> > dff681880bb2b23d0351d6656a1dbbb9,}
>> > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: No region
>> > directory found for region:{NAME =>
>> > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.',
>> > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9', ENCODED
=>
>> > dff681880bb2b23d0351d6656a1dbbb9,}
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegion(MasterSnapshotVerifier.java:167)
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:152)
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:115)
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:156)
>> >         at
>> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
>> >         at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >         at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >         at java.lang.Thread.run(Thread.java:662)
>> > 2015-05-19 06:00:45,505 INFO
>> > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Stop taking
>> > snapshot={ ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH
>> }
>> > because: Failed to take snapshot '{ ss=ss_rich_pin_data_v1
>> > table=rich_pin_data_v1 type=SKIPFLUSH }' due to exception
>> > 2015-05-19 06:00:49,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 50 on 60000 caught: java.lang.ArrayIndexOutOfBoundsException: 2
>> >         at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>> >         at
>> > java.util.Collections$UnmodifiableList.get(Collections.java:1152)
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >         at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at
>> >
>> >
>> com.google.protobuf.GeneratedMessage.invokeOrDie(GeneratedMessage.java:1369)
>> >         at
>> >
>> com.google.protobuf.GeneratedMessage.access$1400(GeneratedMessage.java:57)
>> >         at
>> >
>> >
>> com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularEnumFieldAccessor.get(GeneratedMessage.java:1670)
>> >         at
>> > com.google.protobuf.GeneratedMessage.getField(GeneratedMessage.java:162)
>> >         at
>> >
>> >
>> com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:113)
>> >         at
>> >
>> >
>> com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:152)
>> >         at
>> > com.google.protobuf.TextFormat$Printer.print(TextFormat.java:228)
>> >         at
>> > com.google.protobuf.TextFormat$Printer.access$200(TextFormat.java:217)
>> >         at com.google.protobuf.TextFormat.print(TextFormat.java:68)
>> >         at
>> > com.google.protobuf.TextFormat.printToString(TextFormat.java:115)
>> >         at
>> > com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.snapshot.HSnapshotDescription.toString(HSnapshotDescription.java:72)
>> >         at java.lang.String.valueOf(String.java:2826)
>> >         at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >         at
>> > org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304)
>> >         at java.lang.String.valueOf(String.java:2826)
>> >         at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message