hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tianying Chang <tych...@gmail.com>
Subject Re: Failed to take snapshot due to some region directory is not found
Date Tue, 19 May 2015 20:30:34 GMT
Matteo

We are using hdfs2.0 + HBase94.7.

I saw this ArrayIndexOutOfBoundsException: 2 error also. What does that
mean?

BTW, other tables (but those are smaller in terms of region count) in this
same cluster is able to create snapshot, only this table is failing.

Thanks
Tian-Ying

On Tue, May 19, 2015 at 11:50 AM, Matteo Bertozzi <theo.bertozzi@gmail.com>
wrote:

> can you debug the protobuf problem, I think we abort because we are not
> able to write
>
> 2015-05-19 06:00:49,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 50 on 60000 caught: java.lang.ArrayIndexOutOfBoundsException: 2
>         at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>         at
> java.util.Collections$UnmodifiableList.get(Collections.java:1152)
>         at
>
> org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
> ...
> com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
>         at
>
> org.apache.hadoop.hbase.snapshot.HSnapshotDescription.toString(HSnapshotDescription.java:72)
>         at java.lang.String.valueOf(String.java:2826)
>         at java.lang.StringBuilder.append(StringBuilder.java:115)
>         at
> org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304)
>
> Matteo
>
>
> On Tue, May 19, 2015 at 11:35 AM, Tianying Chang <tychang@gmail.com>
> wrote:
>
> > Actually, I find it does not even print out the debug info below for this
> > table, other table will print out this logging. So it seems it did not
> > invoke the FlushSnapshotSubprocedure at all.
> >
> >
> >  @Override
> >     public Void call() throws Exception {
> >       // Taking the region read lock prevents the individual region from
> > being closed while a
> >       // snapshot is in progress.  This is helpful but not sufficient for
> > preventing races with
> >       // snapshots that involve multiple regions and regionservers.  It
> is
> > still possible to have
> >       // an interleaving such that globally regions are missing, so we
> > still need the verification
> >       // step.
> >       LOG.debug("Starting region operation on " + region);
> >
> > On Tue, May 19, 2015 at 11:26 AM, Tianying Chang <tychang@gmail.com>
> > wrote:
> >
> > > Hi, Esteban,
> > >
> > > There is no region split in this cluster, since we put the region size
> > > upper bound to be really high to prevent splitting.
> > >
> > > I think it happens for all the regions of this table.
> > >
> > > I repeatedly run "hdfs dfs -lsr
> > > /hbase/.hbase-snapshot/ss_rich_pin_data_v1"  while taking snapshot, no
> > > region was able to write into this direction. I also turn on DEBUG
> > logging
> > > on RS, all RS  just report fail with Timeout, with no specific reason.
> > >
> > > Thanks
> > > Tian-Ying
> > >
> > > On Tue, May 19, 2015 at 11:06 AM, Esteban Gutierrez <
> > esteban@cloudera.com>
> > > wrote:
> > >
> > >> Hi Tianying,
> > >>
> > >> Is this happening consistently in this region or is it happening
> > randomly
> > >> across other regions too? One possibility is that there was a split
> > going
> > >> on at the time you started to take the snapshot and it failed. If you
> > look
> > >> into /hbase/rich_pin_data_v1 can you find a directory named
> > >> dff681880bb2b23d0351d6656a1dbbb9 in there?
> > >>
> > >> cheers,
> > >> esteban.
> > >>
> > >>
> > >> --
> > >> Cloudera, Inc.
> > >>
> > >>
> > >> On Mon, May 18, 2015 at 11:12 PM, Tianying Chang <tychang@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We have a cluster that used to be able to take snapshot. But
> recently,
> > >> one
> > >> > table failed due to the error below. Other tables on the same
> clusters
> > >> are
> > >> > fine.
> > >> >
> > >> > Any idea what could go wrong? Is the table not healthy? But I run
> > hbase
> > >> > hbck, it reports cluster healthy.
> > >> >
> > >> > BTW, we are running 94.7, so we need to take snapshot of the data
to
> > >> export
> > >> > to a new cluster of 94.26 as part of upgrade (and eventually upgrade
> > to
> > >> > 1.x)
> > >> >
> > >> > Thanks
> > >> > Tian-Ying
> > >> >
> > >> >
> > >> > 015-05-19 06:00:45,505 ERROR
> > >> > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed
> > >> taking
> > >> > snapshot { ss=ss_rich_pin_data_v1 table=rich_pin_data_v1
> > type=SKIPFLUSH
> > >> }
> > >> > due to exception:No region directory found for region:{NAME =>
> > >> > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.',
> > >> > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9',
> ENCODED
> > =>
> > >> > dff681880bb2b23d0351d6656a1dbbb9,}
> > >> > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: No
> region
> > >> > directory found for region:{NAME =>
> > >> > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.',
> > >> > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9',
> ENCODED
> > =>
> > >> > dff681880bb2b23d0351d6656a1dbbb9,}
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegion(MasterSnapshotVerifier.java:167)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:152)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:115)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:156)
> > >> >         at
> > >> >
> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >> >         at java.lang.Thread.run(Thread.java:662)
> > >> > 2015-05-19 06:00:45,505 INFO
> > >> > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Stop
> > taking
> > >> > snapshot={ ss=ss_rich_pin_data_v1 table=rich_pin_data_v1
> > type=SKIPFLUSH
> > >> }
> > >> > because: Failed to take snapshot '{ ss=ss_rich_pin_data_v1
> > >> > table=rich_pin_data_v1 type=SKIPFLUSH }' due to exception
> > >> > 2015-05-19 06:00:49,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > >> Server
> > >> > handler 50 on 60000 caught:
> java.lang.ArrayIndexOutOfBoundsException:
> > 2
> > >> >         at java.util.Arrays$ArrayList.get(Arrays.java:3381)
> > >> >         at
> > >> > java.util.Collections$UnmodifiableList.get(Collections.java:1152)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
> > >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >> >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> com.google.protobuf.GeneratedMessage.invokeOrDie(GeneratedMessage.java:1369)
> > >> >         at
> > >> >
> > >>
> >
> com.google.protobuf.GeneratedMessage.access$1400(GeneratedMessage.java:57)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularEnumFieldAccessor.get(GeneratedMessage.java:1670)
> > >> >         at
> > >> >
> > com.google.protobuf.GeneratedMessage.getField(GeneratedMessage.java:162)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:113)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:152)
> > >> >         at
> > >> > com.google.protobuf.TextFormat$Printer.print(TextFormat.java:228)
> > >> >         at
> > >> >
> com.google.protobuf.TextFormat$Printer.access$200(TextFormat.java:217)
> > >> >         at com.google.protobuf.TextFormat.print(TextFormat.java:68)
> > >> >         at
> > >> > com.google.protobuf.TextFormat.printToString(TextFormat.java:115)
> > >> >         at
> > >> >
> com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.snapshot.HSnapshotDescription.toString(HSnapshotDescription.java:72)
> > >> >         at java.lang.String.valueOf(String.java:2826)
> > >> >         at java.lang.StringBuilder.append(StringBuilder.java:115)
> > >> >         at
> > >> > org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304)
> > >> >         at java.lang.String.valueOf(String.java:2826)
> > >> >         at java.lang.StringBuilder.append(StringBuilder.java:115)
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message