hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Du, Jingcheng" <jingcheng...@intel.com>
Subject RE: Data loss in MOB snapshot and clone?
Date Thu, 13 Oct 2016 15:31:55 GMT
Hi Tim,

Normally after the snapshot is cloned/restored, there will be an .link directory (the format
is .link-{hfileName}) in the archive directory of the table for both mob and non-mob tables,
and the hfile of {hfileName} will be archived to the same directory with the .link directory.

The hfile won't be deleted by the file cleaner if the .link directory is not empty which means
this hfile is still referenced by others. And the cleaners of HFileLinkCleaner and SnapshotHFileCleaner
can guarantee this.

I did the same test based on the code in HBase master for both mob and non-mob tables, and
data are not lost.

Tim, would you mind trying the steps for normal tables to see if the data will be lost? Just
one row is enough for the table. Thanks a lot.

Regards,
Jingcheng

-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@gmail.com] 
Sent: Thursday, October 13, 2016 4:48 PM
To: dev@hbase.apache.org
Subject: Re: Data loss in MOB snapshot and clone?

Thanks Jingcheng

Yes, it just references the source MOB data until MOB compaction.

Based on that, I think this really is a critical bug.  It allowed the MOBs to be deleted before
that happened, and thus broken references and data loss.  Or am I misunderstanding you please?



On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng <jingcheng.du@intel.com>
wrote:

> Hi Tim,
>
> > was this running a background task to copy the MOB data when the
> snapshot was cloned and I just deleted the source before the copy was 
> complete?
> The MOB data can be copied when mob compaction happens. But the MOB 
> files should not be deleted even if they are not copied and after the 
> source table is deleted. The archive cleaner should keep them until 
> all the references are gone. Let me check the code again.
>
> > when running "snapshot and clone" it just references the source MOB 
> > data
> until a (?) change?
> Yes, it just references the source MOB data until MOB compaction.
>
> > snapshot and clone just doesn't support MOB?
> It supports.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, October 13, 2016 1:56 AM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks - well it is now on the CDH community forum too.
>
> Jonathan Hsieh pretty much described what I see in his comment on
> HBASE-12332
> https://issues.apache.org/jira/browse/HBASE-12332?
> focusedCommentId=14241478&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
>
>
>
> On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hsun@cloudera.com> wrote:
>
> > Hi Tim,,
> >
> > Just read more details, it may not be related with the issue we 
> > fixed (mob compaction related).
> > I am doing a similar test to see if I can reproduce it.
> >
> > Thanks,
> > Huaxiang
> > > On Oct 12, 2016, at 10:29 AM, Tim Robertson 
> > > <timrobertson100@gmail.com>
> > wrote:
> > >
> > > Thanks Ted, Huaxiang
> > >
> > > I'll move this to a Cloudera forum and comment back here if it 
> > > appears unrelated.
> > >
> > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > <mailto:hsun@cloudera.com>> wrote:
> > >
> > >> By the way, I forgot the forum link: 
> > >> http://community.cloudera.com <
> > http://community.cloudera.com/> <
> > >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> > >>
> > >> Thanks,
> > >> Huaxiang
> > >>
> > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> <mailto:
> > hsun@cloudera.com>> wrote:
> > >>>
> > >>> Hi Tim,
> > >>>
> > >>>   I believe that it runs into an issue which is specific to 
> > >>> cloudera
> > >> release we fixed recently. For details, could you discuss it in 
> > >> cdh
> > forum?
> > >>> Copy me(hsun@cloudera.com <mailto:hsun@cloudera.com> <mailto:
> > hsun@cloudera.com <mailto:hsun@cloudera.com>>) in the forum so I
> > >> can explain more there.
> > >>>
> > >>>   Thanks,
> > >>>   Huaxiang
> > >>>
> > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> > yuzhihong@gmail.com> <mailto:
> > >> yuzhihong@gmail.com <mailto:yuzhihong@gmail.com>>> wrote:
> > >>>>
> > >>>> Have you looked at HBASE-16578 ?
> > >>>>
> > >>>> Cheers
> > >>>>
> > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > timrobertson100@gmail.com <mailto:timrobertson100@gmail.com>
> > >> <mailto:timrobertson100@gmail.com 
> > >> <mailto:timrobertson100@gmail.com>>>
> > wrote:
> > >>>>>
> > >>>>> Hi devs,
> > >>>>> [Had a quick chat with Lars G. about this and before opening
a 
> > >>>>> Jira I thought I'd raise it here first]
> > >>>>>
> > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > >>>>>
> > >>>>> Before I dig into this further, I'd like to just ask if anyone

> > >>>>> has
> > seen
> > >>>>> this before?
> > >>>>>
> > >>>>> The initial state was a table (tim_test) built with MOB 
> > >>>>> support and a
> > >> few
> > >>>>> 10's million rows and 10's billions of cells.
> > >>>>>
> > >>>>> I wanted to rename the table to get this into production and

> > >>>>> did so
> > as
> > >>>>> follows:
> > >>>>>
> > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > >>>>>
> > >>>>> At this stage the application all looked good, and so I 
> > >>>>> continued
> > with:
> > >>>>>
> > >>>>> delete_snapshot 'tim_test-snapshot'
> > >>>>> disable 'tim_test'
> > >>>>> drop ‘tim_test’
> > >>>>>
> > >>>>> Then things went... awry and data just started dropping out
in 
> > >>>>> the
> > app.
> > >>>>> Before long, all MOB data seemingly is gone.
> > >>>>>
> > >>>>> The references in the new table MOB folder appear to point
to 
> > >>>>> the
> > >> source
> > >>>>> table (e.g.
> > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfe
> > >>>>> ed
> > >>>>> 2f5f
> > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > 8ae6318dfba2).
> > >>>>>
> > >>>>> The RS logs full of ERROR like:
> > >>>>>
> > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > >> regionserver.HStore:
> > >>>>> The mob file
> > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > >> bfa2ddd66b48
> > >>>>> could not be found in the locations 
> > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > >> <hdfs://ha-nn/hbase/mobdir/
> > <hdfs://ha-nn/hbase/mobdir/>
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>
> > >> ,
> > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > <hdfs://ha-nn/hbase/archive/
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > >> >
> > >>>>>
> > >>>>> What I don't know is:
> > >>>>> 1) was this running a background task to copy the MOB data

> > >>>>> when the snapshot was cloned and I just deleted the source

> > >>>>> before the copy was complete?
> > >>>>> - or
> > >>>>> 2) when running "snapshot and clone" it just references the

> > >>>>> source
> > MOB
> > >>>>> data until a (?) change?
> > >>>>> 3) snapshot and clone just doesn't support MOB?
> > >>>>>
> > >>>>> Can anyone shed some light on this easily before I dig into
it
> > please?
> > >>>>>
> > >>>>> While this situation exists (at least in 1.0.0) might it be

> > >>>>> good to
> > get
> > >>>>> info about data loss for MOB tables into the snapshot clone
docs?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Tim
> >
> >
>
Mime
View raw message