hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: clone table(72TB data) failed with socket timeout
Date Mon, 22 Jun 2015 01:02:43 GMT
I looked at RestoreSnapshotHelper.java from tip of 0.98 branch but didn't
see where StringBuilder.append() is called.

Can you tell us which hbase release you're using so that the line numbers
can be matched against source code ?

Thanks



On Thu, Jun 18, 2015 at 11:44 AM, Tianying Chang <tychang@gmail.com> wrote:

> By looking at the master log, I think it failed because our table tsdb has
> lots of empty region in the source cluster, and exportsnapshot does not
> copy the empty folder over, so at the destination cluster, there are many
> regions that don't have the column family folder, therefore caused error.
> Now I have manually fixed that issue, however, when I try to run
> clone_snapshot again, it always complain that table tsdb already exist. But
> this is not true.  It feels like there is a reference to the table
> somewhere although the previous clone_snapshot failed??  Any idea?
>
> Thanks
> Tian-Ying
>
>
>
> 2015-06-18 04:54:16,255 INFO org.apache.hadoop.hbase.util.FSVisitor: No
> families under region
>
> directory:hdfs://opentsdb-prod-namenode001/hbase/.hbase-snapshot/ss_tsdb/ffdee578233cbe2983fe68f2d42f9eb5
> 2015-06-18 04:54:16,255 INFO org.apache.hadoop.hbase.util.FSVisitor: No
> families under region
>
> directory:hdfs://opentsdb-prod-namenode001/hbase/.hbase-snapshot/ss_tsdb/ffe125fa57f32ccfb667884f3051ee05
> 2015-06-18 04:54:16,255 INFO org.apache.hadoop.hbase.util.FSVisitor: No
> families under region
>
> directory:hdfs://opentsdb-prod-namenode001/hbase/.hbase-snapshot/ss_tsdb/fff1014423977a4abb05dffa9c0dcf22
> 2015-06-18 04:54:16,285 INFO
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager: Clone
> snapshot=ss_tsdb as table=tsdb
> 2015-06-18 04:54:16,285 INFO
> org.apache.hadoop.hbase.master.handler.CreateTableHandler: Attempting to
> create the table tsdb
> 2015-06-18 04:54:16,344 ERROR
> org.apache.hadoop.hbase.master.snapshot.CloneSnapshotHandler: clone
> snapshot={ ss=ss_tsdb table=tsdb type=SKIPFLUSH } failed
> java.lang.ArrayIndexOutOfBoundsException: 2
>         at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>         at
> java.util.Collections$UnmodifiableList.get(Collections.java:1152)
>         at
>
> org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
>
> com.google.protobuf.GeneratedMessage.invokeOrDie(GeneratedMessage.java:1369)
>         at
> com.google.protobuf.GeneratedMessage.access$1400(GeneratedMessage.java:57)
>         at
>
> com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularEnumFieldAccessor.get(GeneratedMessage.java:1670)
>         at
> com.google.protobuf.GeneratedMessage.getField(GeneratedMessage.java:162)
>         at
>
> com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:113)
>         at
>
> com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:152)
>         at
> com.google.protobuf.TextFormat$Printer.print(TextFormat.java:228)
>         at
> com.google.protobuf.TextFormat$Printer.access$200(TextFormat.java:217)
>         at com.google.protobuf.TextFormat.print(TextFormat.java:68)
>         at
> com.google.protobuf.TextFormat.printToString(TextFormat.java:115)
>         at
> com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
>         at java.lang.String.valueOf(String.java:2826)
>         at java.lang.StringBuilder.append(StringBuilder.java:115)
>         at
>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:139)
>         at
>
> org.apache.hadoop.hbase.master.snapshot.CloneSnapshotHandler.handleCreateHdfsRegions(CloneSnapshotHandler.java:109)
>         at
>
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.handleCreateTable(CreateTableHandler.java:181)
>         at
>
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.process(CreateTableHandler.java:127)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)
> 2015-06-18 04:54:16,346 ERROR
> org.apache.hadoop.hbase.master.handler.CreateTableHandler: Error trying to
> create the table tsdb
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotException: clone snapshot={
> ss=ss_tsdb table=tsdb type=SKIPFLUSH } failed
>         at
>
> org.apache.hadoop.hbase.master.snapshot.CloneSnapshotHandler.handleCreateHdfsRegions(CloneSnapshotHandler.java:127)
>         at
>
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.handleCreateTable(CreateTableHandler.java:181)
>         at
>
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.process(CreateTableHandler.java:127)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>         at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>         at
> java.util.Collections$UnmodifiableList.get(Collections.java:1152)
>         at
>
> org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
>
> com.google.protobuf.GeneratedMessage.invokeOrDie(GeneratedMessage.java:1369)
>         at
> com.google.protobuf.GeneratedMessage.access$1400(GeneratedMessage.java:57)
>         at
>
> com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularEnumFieldAccessor.get(GeneratedMessage.java:1670)
>         at
> com.google.protobuf.GeneratedMessage.getField(GeneratedMessage.java:162)
>         at
>
> com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:113)
>         at
>
> com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:152)
>         at
> com.google.protobuf.TextFormat$Printer.print(TextFormat.java:228)
>         at
> com.google.protobuf.TextFormat$Printer.access$200(TextFormat.java:217)
>         at com.google.protobuf.TextFormat.print(TextFormat.java:68)
>         at
> com.google.protobuf.TextFormat.printToString(TextFormat.java:115)
>         at
> com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
>         at java.lang.String.valueOf(String.java:2826)
>         at java.lang.StringBuilder.append(StringBuilder.java:115)
>         at
>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:139)
>         at
>
> org.apache.hadoop.hbase.master.snapshot.CloneSnapshotHandler.handleCreateHdfsRegions(CloneSnapshotHandler.java:109)
>         ... 6 more
>
> On Wed, Jun 17, 2015 at 10:15 PM, Tianying Chang <tychang@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to clone a table from a snapshot. The snapshot is reported to
> > be healthy. However, clone table failed with socket time error, shown as
> > below. BTW, the table is huge with 72T data. Anyone know why? Is it
> because
> > the size is too big so that the some default timeout is not enough?
> >
> > Thanks
> > Tian-Ying
> >
> > clone_snapshot "ss_tsdb", "tsdb"
> >
> > ERROR: java.net.SocketTimeoutException: Call to
> opentsdb-prod-namenode001/
> > 10.1.208.226:60000 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 10000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/10.1.208.226:55096
> > remote=opentsdb-prod-namenode001/10.1.208.226:60000]
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message