Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9768C200B9D for ; Thu, 13 Oct 2016 10:48:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 95F46160AE4; Thu, 13 Oct 2016 08:48:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A2EDF160AE3 for ; Thu, 13 Oct 2016 10:48:13 +0200 (CEST) Received: (qmail 49926 invoked by uid 500); 13 Oct 2016 08:48:12 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 49901 invoked by uid 99); 13 Oct 2016 08:48:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 08:48:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 571831800BA for ; Thu, 13 Oct 2016 08:48:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.398 X-Spam-Level: ** X-Spam-Status: No, score=2.398 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id VBgy_Ubi2pJc for ; Thu, 13 Oct 2016 08:48:08 +0000 (UTC) Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6DD1F5F4E5 for ; Thu, 13 Oct 2016 08:48:08 +0000 (UTC) Received: by mail-oi0-f44.google.com with SMTP id m72so90328659oik.3 for ; Thu, 13 Oct 2016 01:48:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=r1vwjClWW31gWpLl+qEZa5bGwMWCTIJNG/c9J3zfmJQ=; b=yD9F3KbtoXIrSwYZWoFI/vMyHdIt7zvmKKAQTKJGrr2waXORVS4juxFhVqhMmLZhNQ 18a7Lpzw2NCqjz2BxsZ/r1zsLpVGbNxXt4gPyd7LOYPod6ha0ItmlNC1IFYOC5JvPa73 /+sbt/ZM9P41LDPVEkL6U2vXW9QgI3EQLZ+PIgCOa2b7NFQDhorzxOse2VO1d000Ph4R gXxYk0xPEAV/OkXWMq24xzUOi+1IzIWfFjWDFXJrXww5npplni7NALdYcCGet7aVl8M6 NR1y1UYYOWkYjXPlDL7669tL6T4JJ5jVMsjZHEguRkEozRwGeG1f9OUT5arcYbmZHaWU M3lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=r1vwjClWW31gWpLl+qEZa5bGwMWCTIJNG/c9J3zfmJQ=; b=PbEabuDT1ahvzboAXPz3B6HfKYq6DaY3QoVtAwkE/1zfLcAIuOjrQnlLjpdTo6F3xK A2LHFPHe60XcJL7N0sJMyhLu5NtBBWtDpbUsKNI5xfWyCKkXiBEBugBAdD13oHodlzKb y5uVtuxpN2Szbfm4IxXo1js8dp5zpD5bL9QOMMwx32fiJLQroFkvhtmdulVpKsdwwJDa hFEs0sV4xzF593ZPjQ5gb8SIp+avkScueQH0ajw4EXKQSW8rys8u+Qk34/6X//NO9PpY RNRgDo/I50qtuxrX44MNhRLbjSMZbOUJDnAWTrKC3Yh6yPavmKiIx6yrdYYi6m9hvRlS fVtg== X-Gm-Message-State: AA6/9RmhqBL1W3pLT+3M5sEUGniRP9ztzWiqQoFMnCBUq1r7JTMzrZS5U/rpg+gWumuvcOH5EUqHyhxs5qgXpQ== X-Received: by 10.202.205.84 with SMTP id d81mr3630823oig.137.1476348487666; Thu, 13 Oct 2016 01:48:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.16.69 with HTTP; Thu, 13 Oct 2016 01:48:07 -0700 (PDT) In-Reply-To: <3F729F9F8571BB49A982A6479F2BEEA451CD9C29@SHSMSX103.ccr.corp.intel.com> References: <2E9C3476-D4BB-4325-9898-AA564C33F081@gmail.com> <0826C544-A198-4260-A99F-3CF4DAA65620@cloudera.com> <1716A731-D23A-494E-9DE1-A298F2E2C28E@cloudera.com> <225D7761-D83D-416E-A39F-48230B0BBE54@cloudera.com> <3F729F9F8571BB49A982A6479F2BEEA451CD9C29@SHSMSX103.ccr.corp.intel.com> From: Tim Robertson Date: Thu, 13 Oct 2016 10:48:07 +0200 Message-ID: Subject: Re: Data loss in MOB snapshot and clone? To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1134ede22c123c053ebb2b1c archived-at: Thu, 13 Oct 2016 08:48:14 -0000 --001a1134ede22c123c053ebb2b1c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks Jingcheng Yes, it just references the source MOB data until MOB compaction. Based on that, I think this really is a critical bug. It allowed the MOBs to be deleted before that happened, and thus broken references and data loss. Or am I misunderstanding you please? On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng wrote: > Hi Tim, > > > was this running a background task to copy the MOB data when the > snapshot was cloned and I just deleted the source before the copy was > complete? > The MOB data can be copied when mob compaction happens. But the MOB files > should not be deleted even if they are not copied and after the source > table is deleted. The archive cleaner should keep them until all the > references are gone. Let me check the code again. > > > when running "snapshot and clone" it just references the source MOB dat= a > until a (?) change? > Yes, it just references the source MOB data until MOB compaction. > > > snapshot and clone just doesn't support MOB? > It supports. > > Regards, > Jingcheng > > -----Original Message----- > From: Tim Robertson [mailto:timrobertson100@gmail.com] > Sent: Thursday, October 13, 2016 1:56 AM > To: dev@hbase.apache.org > Subject: Re: Data loss in MOB snapshot and clone? > > Thanks - well it is now on the CDH community forum too. > > Jonathan Hsieh pretty much described what I see in his comment on > HBASE-12332 > https://issues.apache.org/jira/browse/HBASE-12332? > focusedCommentId=3D14241478&page=3Dcom.atlassian.jira. > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478 > > > > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun wrote: > > > Hi Tim,, > > > > Just read more details, it may not be related with the issue we fixed > > (mob compaction related). > > I am doing a similar test to see if I can reproduce it. > > > > Thanks, > > Huaxiang > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson > > > > > wrote: > > > > > > Thanks Ted, Huaxiang > > > > > > I'll move this to a Cloudera forum and comment back here if it > > > appears unrelated. > > > > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun > > wrote: > > > > > >> By the way, I forgot the forum link: http://community.cloudera.com > > >> < > > http://community.cloudera.com/> < > > >> http://community.cloudera.com/ > > > >> > > >> Thanks, > > >> Huaxiang > > >> > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun > hsun@cloudera.com>> wrote: > > >>> > > >>> Hi Tim, > > >>> > > >>> I believe that it runs into an issue which is specific to > > >>> cloudera > > >> release we fixed recently. For details, could you discuss it in cdh > > forum? > > >>> Copy me(hsun@cloudera.com > hsun@cloudera.com >) in the forum so I > > >> can explain more there. > > >>> > > >>> Thanks, > > >>> Huaxiang > > >>> > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu > yuzhihong@gmail.com> > >> yuzhihong@gmail.com >> wrote: > > >>>> > > >>>> Have you looked at HBASE-16578 ? > > >>>> > > >>>> Cheers > > >>>> > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson < > > timrobertson100@gmail.com > > >> > >> >> > > wrote: > > >>>>> > > >>>>> Hi devs, > > >>>>> [Had a quick chat with Lars G. about this and before opening a > > >>>>> Jira I thought I'd raise it here first] > > >>>>> > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10. > > >>>>> > > >>>>> Before I dig into this further, I'd like to just ask if anyone > > >>>>> has > > seen > > >>>>> this before? > > >>>>> > > >>>>> The initial state was a table (tim_test) built with MOB support > > >>>>> and a > > >> few > > >>>>> 10's million rows and 10's billions of cells. > > >>>>> > > >>>>> I wanted to rename the table to get this into production and did > > >>>>> so > > as > > >>>>> follows: > > >>>>> > > >>>>> snapshot 'tim_test', 'tim_test-snapshot' > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map' > > >>>>> > > >>>>> At this stage the application all looked good, and so I > > >>>>> continued > > with: > > >>>>> > > >>>>> delete_snapshot 'tim_test-snapshot' > > >>>>> disable 'tim_test' > > >>>>> drop =E2=80=98tim_test=E2=80=99 > > >>>>> > > >>>>> Then things went... awry and data just started dropping out in > > >>>>> the > > app. > > >>>>> Before long, all MOB data seemingly is gone. > > >>>>> > > >>>>> The references in the new table MOB folder appear to point to > > >>>>> the > > >> source > > >>>>> table (e.g. > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed > > >>>>> 2f5f > > >> 2a/EPSG_4326/tim_test=3D14bf5f1737ac65c34615ed97c0b7de06- > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe > > 8ae6318dfba2). > > >>>>> > > >>>>> The RS logs full of ERROR like: > > >>>>> > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase. > > >> regionserver.HStore: > > >>>>> The mob file > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e > > >> bfa2ddd66b48 > > >>>>> could not be found in the locations > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/ > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 > > >> > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>, > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/ > > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326] > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]> > > >>>>> > > >>>>> What I don't know is: > > >>>>> 1) was this running a background task to copy the MOB data when > > >>>>> the snapshot was cloned and I just deleted the source before the > > >>>>> copy was complete? > > >>>>> - or > > >>>>> 2) when running "snapshot and clone" it just references the > > >>>>> source > > MOB > > >>>>> data until a (?) change? > > >>>>> 3) snapshot and clone just doesn't support MOB? > > >>>>> > > >>>>> Can anyone shed some light on this easily before I dig into it > > please? > > >>>>> > > >>>>> While this situation exists (at least in 1.0.0) might it be good > > >>>>> to > > get > > >>>>> info about data loss for MOB tables into the snapshot clone docs? > > >>>>> > > >>>>> Thanks, > > >>>>> Tim > > > > > --001a1134ede22c123c053ebb2b1c--