hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: Data loss in MOB snapshot and clone?
Date Wed, 12 Oct 2016 17:56:26 GMT
Thanks - well it is now on the CDH community forum too.

Jonathan Hsieh pretty much described what I see in his comment on
HBASE-12332
https://issues.apache.org/jira/browse/HBASE-12332?focusedCommentId=14241478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14241478



On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hsun@cloudera.com> wrote:

> Hi Tim,,
>
> Just read more details, it may not be related with the issue we fixed (mob
> compaction related).
> I am doing a similar test to see if I can reproduce it.
>
> Thanks,
> Huaxiang
> > On Oct 12, 2016, at 10:29 AM, Tim Robertson <timrobertson100@gmail.com>
> wrote:
> >
> > Thanks Ted, Huaxiang
> >
> > I'll move this to a Cloudera forum and comment back here if it appears
> > unrelated.
> >
> > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> <mailto:hsun@cloudera.com>> wrote:
> >
> >> By the way, I forgot the forum link: http://community.cloudera.com <
> http://community.cloudera.com/> <
> >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> >>
> >> Thanks,
> >> Huaxiang
> >>
> >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com <mailto:
> hsun@cloudera.com>> wrote:
> >>>
> >>> Hi Tim,
> >>>
> >>>   I believe that it runs into an issue which is specific to cloudera
> >> release we fixed recently. For details, could you discuss it in cdh
> forum?
> >>> Copy me(hsun@cloudera.com <mailto:hsun@cloudera.com> <mailto:
> hsun@cloudera.com <mailto:hsun@cloudera.com>>) in the forum so I
> >> can explain more there.
> >>>
> >>>   Thanks,
> >>>   Huaxiang
> >>>
> >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> yuzhihong@gmail.com> <mailto:
> >> yuzhihong@gmail.com <mailto:yuzhihong@gmail.com>>> wrote:
> >>>>
> >>>> Have you looked at HBASE-16578 ?
> >>>>
> >>>> Cheers
> >>>>
> >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> timrobertson100@gmail.com <mailto:timrobertson100@gmail.com>
> >> <mailto:timrobertson100@gmail.com <mailto:timrobertson100@gmail.com>>>
> wrote:
> >>>>>
> >>>>> Hi devs,
> >>>>> [Had a quick chat with Lars G. about this and before opening a Jira
I
> >>>>> thought I'd raise it here first]
> >>>>>
> >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> >>>>>
> >>>>> Before I dig into this further, I'd like to just ask if anyone has
> seen
> >>>>> this before?
> >>>>>
> >>>>> The initial state was a table (tim_test) built with MOB support
and a
> >> few
> >>>>> 10's million rows and 10's billions of cells.
> >>>>>
> >>>>> I wanted to rename the table to get this into production and did
so
> as
> >>>>> follows:
> >>>>>
> >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> >>>>>
> >>>>> At this stage the application all looked good, and so I continued
> with:
> >>>>>
> >>>>> delete_snapshot 'tim_test-snapshot'
> >>>>> disable 'tim_test'
> >>>>> drop ‘tim_test’
> >>>>>
> >>>>> Then things went... awry and data just started dropping out in the
> app.
> >>>>> Before long, all MOB data seemingly is gone.
> >>>>>
> >>>>> The references in the new table MOB folder appear to point to the
> >> source
> >>>>> table (e.g.
> >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f
> >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> 8ae6318dfba2).
> >>>>>
> >>>>> The RS logs full of ERROR like:
> >>>>>
> >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> >> regionserver.HStore:
> >>>>> The mob file
> >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> >> bfa2ddd66b48
> >>>>> could not be found in the locations
> >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 <hdfs://ha-nn/hbase/mobdir/
> <hdfs://ha-nn/hbase/mobdir/>
> >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
> >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> <hdfs://ha-nn/hbase/archive/
> >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
> >>>>>
> >>>>> What I don't know is:
> >>>>> 1) was this running a background task to copy the MOB data when
the
> >>>>> snapshot was cloned and I just deleted the source before the copy
was
> >>>>> complete?
> >>>>> - or
> >>>>> 2) when running "snapshot and clone" it just references the source
> MOB
> >>>>> data until a (?) change?
> >>>>> 3) snapshot and clone just doesn't support MOB?
> >>>>>
> >>>>> Can anyone shed some light on this easily before I dig into it
> please?
> >>>>>
> >>>>> While this situation exists (at least in 1.0.0) might it be good
to
> get
> >>>>> info about data loss for MOB tables into the snapshot clone docs?
> >>>>>
> >>>>> Thanks,
> >>>>> Tim
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message