hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Koch <ogd...@googlemail.com>
Subject Re: Snapshot clone error
Date Fri, 31 Jan 2014 15:43:55 GMT
Actually, I just noticed - the snapshot on the source cluster ok, it's the
exported snapshot on the destination cluster that's corrupted.


On Fri, Jan 31, 2014 at 4:40 PM, David Koch <ogdude@googlemail.com> wrote:

> Thanks for your reply,
>
> As a matter of fact when running with the "-files" option it turns out a
> lot of files are missing from the snapshot which I did not manage to
> restore. It's possible that hbck was run during snapshotting.
>
> **************************************************************
> BAD SNAPSHOT: 6659 hfile(s) and 0 log(s) missing.
> **************************************************************
> 78 HFiles (78 in archive), total size 14.3 G (0.00% 0 shared with the
> source table)
> 0 Logs, total size 0
>
> 78 files is exactly the number of regions that I found after attempting
> restoration.
>
> We followed standard procedure as described in the manual:
> http://hbase.apache.org/book/ops.snapshots.html
>
> I will try again and make sure no hbchk is intervening.
>
> /David
>
>
> On Fri, Jan 31, 2014 at 4:20 PM, Matteo Bertozzi <theo.bertozzi@gmail.com>wrote:
>
>> you should use SnapshotInfo with the "-files" options and you'll probably
>> see that one snapshot is corrupted.
>> in HBase 0.94.15/CDH 4.6 there will be a fix (HBASE-10111) that will
>> prevent to restore/clone a corrupted snapshot.
>>
>> a corrupted snapshot means that some file contained in the snapshot is
>> missing from the .archive
>> that situation may happen if you have removed files by hand, or you run
>> hbck that sidelined the files or similar
>> (unless there is a bug somewhere)
>> do you remember the steps that you followed? did you use ExportSnapshot?
>> did you moved the files by hand to another cluster or similar?
>>
>> Offline or Online snapshot shouldn't make difference, the corruption is
>> probably happened after taking the snapshot.
>> You can retry taking the snapshot, and periodically run SnapshotInfo with
>> the -files options to verify the state and post the logs in case you get a
>> corruption again.
>>
>> Matteo
>>
>>
>>
>> On Fri, Jan 31, 2014 at 3:10 PM, David Koch <ogdude@googlemail.com>
>> wrote:
>>
>> > Matteo,
>> >
>> > Thank you for your reply. All clients, servers are using the same
>> version:
>> >
>> > 14/01/31 16:06:20 INFO util.VersionInfo: HBase 0.94.6-cdh4.5.0
>> >
>> > Also, the information generated by:
>> >
>> > hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
>> >
>> > is identical for snapshots which I managed to clone and those for which
>> the
>> > cloning/restoration failed. Would you advise re-trying snaphotting the
>> > table while it is disabled? Otherwise I'll go with old-fashioned
>> CopyTable
>> > or re-import into HBase from HDFS files.
>> >
>> > Thank you,
>> >
>> > /David
>> >
>> >
>> > On Fri, Jan 31, 2014 at 2:42 PM, Matteo Bertozzi <
>> theo.bertozzi@gmail.com
>> > >wrote:
>> >
>> > > the snapshot seems to be corrupted, which version are you running?
>> > >
>> > > Matteo
>> > >
>> > >
>> > >
>> > > On Fri, Jan 31, 2014 at 1:06 PM, David Koch <ogdude@googlemail.com>
>> > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > We export an online snapshot of a table to a different cluster, when
>> > > > attempting a clone on the destination cluster using:
>> > > >
>> > > > clone_snapshot 'table_source_snapshot', 'table_dest'
>> > > >
>> > > > it does not work.
>> > > >
>> > > > The operation times out after a a while
>> > > >
>> > > > ERROR: java.io.IOException: Table 'table_dest' not yet enabled,
>> after
>> > > > 1996939ms.
>> > > >
>> > > > and I see only a fraction of the number of regions in the
>> destination
>> > > > table. Table is indicated as "enabled" but I cannot perform any
>> scans
>> > on
>> > > > it.
>> > > >
>> > > > The snapshot info returns the following:
>> > > >
>> > > > Snapshot Info
>> > > > ----------------------------------------
>> > > >    Name: table_source_snapshot
>> > > >    Type: FLUSH
>> > > >   Table: table_source
>> > > >  Format: 0
>> > > > Created: 2014-01-30T13:05:02
>> > > >
>> > > > Snapshot seems to be intact. What could be the error? Should I take
>> an
>> > > > offline snapshot instead? Going via restore/enable instead of clone
>> > does
>> > > > not seem to work either.
>> > > >
>> > > > Also, I see the following in the region servers:
>> > > >
>> > > > 2:24:35.807 PM ERROR
>> > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler
>> > > > Failed open of
>> > > > region=
>> > > >
>> > >
>> >
>> table_source,\x82\x12Y\x00\xE98C\xEE\xBC\xCC\xE3h\xDAPt\xA6,1366259070788.63ca017ac7cd03e68c35a4da8b56421d.,
>> > > > starting to roll back the global memstore size.
>> > > > java.io.IOException: java.io.IOException:
>> > java.io.FileNotFoundException:
>> > > > Unable to open link: org.apache.hadoop.hbase.io.HFileLink
>> > > >
>> > > >
>> > >
>> >
>> locations=[hdfs://nameservice1/hbase/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c,
>> > > >
>> > > >
>> > >
>> >
>> hdfs://nameservice1/hbase/.tmp/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c,
>> > > >
>> > > >
>> > >
>> >
>> hdfs://nameservice1/hbase/.archive/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c]
>> > > >
>> > > > None of these parts actually exist, however:
>> > > >
>> > > >
>> > >
>> >
>> hdfs://nameservice1/hbase/.snapshot/table_source_snapshot/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c
>> > > > does exist.
>> > > >
>> > > > I don't think that's the issue though, since I applied the same
>> steps
>> > to
>> > > a
>> > > > smaller table and it worked.
>> > > >
>> > > > Any advice is appreciated,
>> > > >
>> > > > Regards,
>> > > >
>> > > > /David
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message