hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: Data loss in MOB snapshot and clone?
Date Fri, 14 Oct 2016 10:56:56 GMT
Thanks Jingcheng

You are probably better placed to describe the true problem than me, so
please do create the issue.  I'll try and find time next week to offer a
unit test unless someone gets to it first.





On Fri, Oct 14, 2016 at 12:47 PM, Du, Jingcheng <jingcheng.du@intel.com>
wrote:

> Hi Tim,
>
> This should be an issue. I'll file a jira to fix this.
> Some MOB hfiles that are still being flushed are missed in snapshotting.
> For the temporary solution, you can run 'flush tablename' before running
> 'snapshot tablename snapshotname'. This can avoid this issue. Thanks again
> for your findings.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Friday, October 14, 2016 1:54 PM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks for trying that Jingcheng
>
> I'll get time to do some testing next week on this and see if I can come
> up with a reproducible test.
> I can confirm for non-MOB is it all fine, and fields below the MOB
> threshold were not lost in the original process.
>
> Cheers,
> Tim
>
> On Thu, Oct 13, 2016 at 5:31 PM, Du, Jingcheng <jingcheng.du@intel.com>
> wrote:
>
> > Hi Tim,
> >
> > Normally after the snapshot is cloned/restored, there will be an .link
> > directory (the format is .link-{hfileName}) in the archive directory
> > of the table for both mob and non-mob tables, and the hfile of
> > {hfileName} will be archived to the same directory with the .link
> directory.
> > The hfile won't be deleted by the file cleaner if the .link directory
> > is not empty which means this hfile is still referenced by others. And
> > the cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee
> this.
> >
> > I did the same test based on the code in HBase master for both mob and
> > non-mob tables, and data are not lost.
> >
> > Tim, would you mind trying the steps for normal tables to see if the
> > data will be lost? Just one row is enough for the table. Thanks a lot.
> >
> > Regards,
> > Jingcheng
> >
> > -----Original Message-----
> > From: Tim Robertson [mailto:timrobertson100@gmail.com]
> > Sent: Thursday, October 13, 2016 4:48 PM
> > To: dev@hbase.apache.org
> > Subject: Re: Data loss in MOB snapshot and clone?
> >
> > Thanks Jingcheng
> >
> > Yes, it just references the source MOB data until MOB compaction.
> >
> > Based on that, I think this really is a critical bug.  It allowed the
> > MOBs to be deleted before that happened, and thus broken references
> > and data loss.  Or am I misunderstanding you please?
> >
> >
> >
> > On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng
> > <jingcheng.du@intel.com>
> > wrote:
> >
> > > Hi Tim,
> > >
> > > > was this running a background task to copy the MOB data when the
> > > snapshot was cloned and I just deleted the source before the copy
> > > was complete?
> > > The MOB data can be copied when mob compaction happens. But the MOB
> > > files should not be deleted even if they are not copied and after
> > > the source table is deleted. The archive cleaner should keep them
> > > until all the references are gone. Let me check the code again.
> > >
> > > > when running "snapshot and clone" it just references the source
> > > > MOB data
> > > until a (?) change?
> > > Yes, it just references the source MOB data until MOB compaction.
> > >
> > > > snapshot and clone just doesn't support MOB?
> > > It supports.
> > >
> > > Regards,
> > > Jingcheng
> > >
> > > -----Original Message-----
> > > From: Tim Robertson [mailto:timrobertson100@gmail.com]
> > > Sent: Thursday, October 13, 2016 1:56 AM
> > > To: dev@hbase.apache.org
> > > Subject: Re: Data loss in MOB snapshot and clone?
> > >
> > > Thanks - well it is now on the CDH community forum too.
> > >
> > > Jonathan Hsieh pretty much described what I see in his comment on
> > > HBASE-12332
> > > https://issues.apache.org/jira/browse/HBASE-12332?
> > > focusedCommentId=14241478&page=com.atlassian.jira.
> > > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
> > >
> > >
> > >
> > > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hsun@cloudera.com>
> wrote:
> > >
> > > > Hi Tim,,
> > > >
> > > > Just read more details, it may not be related with the issue we
> > > > fixed (mob compaction related).
> > > > I am doing a similar test to see if I can reproduce it.
> > > >
> > > > Thanks,
> > > > Huaxiang
> > > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson
> > > > > <timrobertson100@gmail.com>
> > > > wrote:
> > > > >
> > > > > Thanks Ted, Huaxiang
> > > > >
> > > > > I'll move this to a Cloudera forum and comment back here if it
> > > > > appears unrelated.
> > > > >
> > > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > > > <mailto:hsun@cloudera.com>> wrote:
> > > > >
> > > > >> By the way, I forgot the forum link:
> > > > >> http://community.cloudera.com <
> > > > http://community.cloudera.com/> <
> > > > >> http://community.cloudera.com/
> > > > >> <http://community.cloudera.com/>>
> > > > >>
> > > > >> Thanks,
> > > > >> Huaxiang
> > > > >>
> > > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> > > <mailto:
> > > > hsun@cloudera.com>> wrote:
> > > > >>>
> > > > >>> Hi Tim,
> > > > >>>
> > > > >>>   I believe that it runs into an issue which is specific
to
> > > > >>> cloudera
> > > > >> release we fixed recently. For details, could you discuss it
in
> > > > >> cdh
> > > > forum?
> > > > >>> Copy me(hsun@cloudera.com <mailto:hsun@cloudera.com>
<mailto:
> > > > hsun@cloudera.com <mailto:hsun@cloudera.com>>) in the forum so
I
> > > > >> can explain more there.
> > > > >>>
> > > > >>>   Thanks,
> > > > >>>   Huaxiang
> > > > >>>
> > > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com
> <mailto:
> > > > yuzhihong@gmail.com> <mailto:
> > > > >> yuzhihong@gmail.com <mailto:yuzhihong@gmail.com>>>
wrote:
> > > > >>>>
> > > > >>>> Have you looked at HBASE-16578 ?
> > > > >>>>
> > > > >>>> Cheers
> > > > >>>>
> > > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > > > timrobertson100@gmail.com <mailto:timrobertson100@gmail.com>
> > > > >> <mailto:timrobertson100@gmail.com
> > > > >> <mailto:timrobertson100@gmail.com>>>
> > > > wrote:
> > > > >>>>>
> > > > >>>>> Hi devs,
> > > > >>>>> [Had a quick chat with Lars G. about this and before
opening
> > > > >>>>> a Jira I thought I'd raise it here first]
> > > > >>>>>
> > > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > > > >>>>>
> > > > >>>>> Before I dig into this further, I'd like to just
ask if
> > > > >>>>> anyone has
> > > > seen
> > > > >>>>> this before?
> > > > >>>>>
> > > > >>>>> The initial state was a table (tim_test) built with
MOB
> > > > >>>>> support and a
> > > > >> few
> > > > >>>>> 10's million rows and 10's billions of cells.
> > > > >>>>>
> > > > >>>>> I wanted to rename the table to get this into production
and
> > > > >>>>> did so
> > > > as
> > > > >>>>> follows:
> > > > >>>>>
> > > > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > > > >>>>>
> > > > >>>>> At this stage the application all looked good, and
so I
> > > > >>>>> continued
> > > > with:
> > > > >>>>>
> > > > >>>>> delete_snapshot 'tim_test-snapshot'
> > > > >>>>> disable 'tim_test'
> > > > >>>>> drop ‘tim_test’
> > > > >>>>>
> > > > >>>>> Then things went... awry and data just started dropping
out
> > > > >>>>> in the
> > > > app.
> > > > >>>>> Before long, all MOB data seemingly is gone.
> > > > >>>>>
> > > > >>>>> The references in the new table MOB folder appear
to point
> > > > >>>>> to the
> > > > >> source
> > > > >>>>> table (e.g.
> > > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bd
> > > > >>>>> fe
> > > > >>>>> ed
> > > > >>>>> 2f5f
> > > > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > > > 8ae6318dfba2).
> > > > >>>>>
> > > > >>>>> The RS logs full of ERROR like:
> > > > >>>>>
> > > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > > > >> regionserver.HStore:
> > > > >>>>> The mob file
> > > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > > > >> bfa2ddd66b48
> > > > >>>>> could not be found in the locations
> > > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > > > >> <hdfs://ha-nn/hbase/mobdir/
> > > > <hdfs://ha-nn/hbase/mobdir/>
> > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > > >> 6>
> > > > >> ,
> > > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > > > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > > > <hdfs://ha-nn/hbase/archive/
> > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > > >> 6]
> > > > >> >
> > > > >>>>>
> > > > >>>>> What I don't know is:
> > > > >>>>> 1) was this running a background task to copy the
MOB data
> > > > >>>>> when the snapshot was cloned and I just deleted the
source
> > > > >>>>> before the copy was complete?
> > > > >>>>> - or
> > > > >>>>> 2) when running "snapshot and clone" it just references
the
> > > > >>>>> source
> > > > MOB
> > > > >>>>> data until a (?) change?
> > > > >>>>> 3) snapshot and clone just doesn't support MOB?
> > > > >>>>>
> > > > >>>>> Can anyone shed some light on this easily before
I dig into
> > > > >>>>> it
> > > > please?
> > > > >>>>>
> > > > >>>>> While this situation exists (at least in 1.0.0) might
it be
> > > > >>>>> good to
> > > > get
> > > > >>>>> info about data loss for MOB tables into the snapshot
clone
> docs?
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Tim
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message