hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Koch <ogd...@googlemail.com>
Subject Re: Snapshot clone error
Date Fri, 31 Jan 2014 19:16:13 GMT
It ended up working. Thank you for your help.

/David


On Fri, Jan 31, 2014 at 4:46 PM, Matteo Bertozzi <theo.bertozzi@gmail.com>wrote:

> thanks for the confirmation.
>
> can you try to export the snapshot again and keep the log file if the
> result of the export will be broken again?
> Thanks!
>
> Matteo
>
>
>
> On Fri, Jan 31, 2014 at 3:43 PM, David Koch <ogdude@googlemail.com> wrote:
>
> > Actually, I just noticed - the snapshot on the source cluster ok, it's
> the
> > exported snapshot on the destination cluster that's corrupted.
> >
> >
> > On Fri, Jan 31, 2014 at 4:40 PM, David Koch <ogdude@googlemail.com>
> wrote:
> >
> > > Thanks for your reply,
> > >
> > > As a matter of fact when running with the "-files" option it turns out
> a
> > > lot of files are missing from the snapshot which I did not manage to
> > > restore. It's possible that hbck was run during snapshotting.
> > >
> > > **************************************************************
> > > BAD SNAPSHOT: 6659 hfile(s) and 0 log(s) missing.
> > > **************************************************************
> > > 78 HFiles (78 in archive), total size 14.3 G (0.00% 0 shared with the
> > > source table)
> > > 0 Logs, total size 0
> > >
> > > 78 files is exactly the number of regions that I found after attempting
> > > restoration.
> > >
> > > We followed standard procedure as described in the manual:
> > > http://hbase.apache.org/book/ops.snapshots.html
> > >
> > > I will try again and make sure no hbchk is intervening.
> > >
> > > /David
> > >
> > >
> > > On Fri, Jan 31, 2014 at 4:20 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>wrote:
> > >
> > >> you should use SnapshotInfo with the "-files" options and you'll
> > probably
> > >> see that one snapshot is corrupted.
> > >> in HBase 0.94.15/CDH 4.6 there will be a fix (HBASE-10111) that will
> > >> prevent to restore/clone a corrupted snapshot.
> > >>
> > >> a corrupted snapshot means that some file contained in the snapshot is
> > >> missing from the .archive
> > >> that situation may happen if you have removed files by hand, or you
> run
> > >> hbck that sidelined the files or similar
> > >> (unless there is a bug somewhere)
> > >> do you remember the steps that you followed? did you use
> ExportSnapshot?
> > >> did you moved the files by hand to another cluster or similar?
> > >>
> > >> Offline or Online snapshot shouldn't make difference, the corruption
> is
> > >> probably happened after taking the snapshot.
> > >> You can retry taking the snapshot, and periodically run SnapshotInfo
> > with
> > >> the -files options to verify the state and post the logs in case you
> > get a
> > >> corruption again.
> > >>
> > >> Matteo
> > >>
> > >>
> > >>
> > >> On Fri, Jan 31, 2014 at 3:10 PM, David Koch <ogdude@googlemail.com>
> > >> wrote:
> > >>
> > >> > Matteo,
> > >> >
> > >> > Thank you for your reply. All clients, servers are using the same
> > >> version:
> > >> >
> > >> > 14/01/31 16:06:20 INFO util.VersionInfo: HBase 0.94.6-cdh4.5.0
> > >> >
> > >> > Also, the information generated by:
> > >> >
> > >> > hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
> > >> >
> > >> > is identical for snapshots which I managed to clone and those for
> > which
> > >> the
> > >> > cloning/restoration failed. Would you advise re-trying snaphotting
> the
> > >> > table while it is disabled? Otherwise I'll go with old-fashioned
> > >> CopyTable
> > >> > or re-import into HBase from HDFS files.
> > >> >
> > >> > Thank you,
> > >> >
> > >> > /David
> > >> >
> > >> >
> > >> > On Fri, Jan 31, 2014 at 2:42 PM, Matteo Bertozzi <
> > >> theo.bertozzi@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > the snapshot seems to be corrupted, which version are you running?
> > >> > >
> > >> > > Matteo
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Fri, Jan 31, 2014 at 1:06 PM, David Koch <
> ogdude@googlemail.com>
> > >> > wrote:
> > >> > >
> > >> > > > Hello,
> > >> > > >
> > >> > > > We export an online snapshot of a table to a different cluster,
> > when
> > >> > > > attempting a clone on the destination cluster using:
> > >> > > >
> > >> > > > clone_snapshot 'table_source_snapshot', 'table_dest'
> > >> > > >
> > >> > > > it does not work.
> > >> > > >
> > >> > > > The operation times out after a a while
> > >> > > >
> > >> > > > ERROR: java.io.IOException: Table 'table_dest' not yet enabled,
> > >> after
> > >> > > > 1996939ms.
> > >> > > >
> > >> > > > and I see only a fraction of the number of regions in the
> > >> destination
> > >> > > > table. Table is indicated as "enabled" but I cannot perform
any
> > >> scans
> > >> > on
> > >> > > > it.
> > >> > > >
> > >> > > > The snapshot info returns the following:
> > >> > > >
> > >> > > > Snapshot Info
> > >> > > > ----------------------------------------
> > >> > > >    Name: table_source_snapshot
> > >> > > >    Type: FLUSH
> > >> > > >   Table: table_source
> > >> > > >  Format: 0
> > >> > > > Created: 2014-01-30T13:05:02
> > >> > > >
> > >> > > > Snapshot seems to be intact. What could be the error? Should
I
> > take
> > >> an
> > >> > > > offline snapshot instead? Going via restore/enable instead
of
> > clone
> > >> > does
> > >> > > > not seem to work either.
> > >> > > >
> > >> > > > Also, I see the following in the region servers:
> > >> > > >
> > >> > > > 2:24:35.807 PM ERROR
> > >> > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler
> > >> > > > Failed open of
> > >> > > > region=
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> table_source,\x82\x12Y\x00\xE98C\xEE\xBC\xCC\xE3h\xDAPt\xA6,1366259070788.63ca017ac7cd03e68c35a4da8b56421d.,
> > >> > > > starting to roll back the global memstore size.
> > >> > > > java.io.IOException: java.io.IOException:
> > >> > java.io.FileNotFoundException:
> > >> > > > Unable to open link: org.apache.hadoop.hbase.io.HFileLink
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> locations=[hdfs://nameservice1/hbase/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c,
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> hdfs://nameservice1/hbase/.tmp/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c,
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> hdfs://nameservice1/hbase/.archive/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c]
> > >> > > >
> > >> > > > None of these parts actually exist, however:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> hdfs://nameservice1/hbase/.snapshot/table_source_snapshot/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c
> > >> > > > does exist.
> > >> > > >
> > >> > > > I don't think that's the issue though, since I applied the
same
> > >> steps
> > >> > to
> > >> > > a
> > >> > > > smaller table and it worked.
> > >> > > >
> > >> > > > Any advice is appreciated,
> > >> > > >
> > >> > > > Regards,
> > >> > > >
> > >> > > > /David
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message