Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 43DD9200B8D for ; Fri, 23 Sep 2016 19:53:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4264C160AD0; Fri, 23 Sep 2016 17:53:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 87594160ACF for ; Fri, 23 Sep 2016 19:53:21 +0200 (CEST) Received: (qmail 26191 invoked by uid 500); 23 Sep 2016 17:53:20 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 26178 invoked by uid 99); 23 Sep 2016 17:53:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Sep 2016 17:53:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 80A2E2C0B0B for ; Fri, 23 Sep 2016 17:53:20 +0000 (UTC) Date: Fri, 23 Sep 2016 17:53:20 +0000 (UTC) From: "Matteo Bertozzi (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 23 Sep 2016 17:53:22 -0000 [ https://issues.apache.org/jira/browse/HBASE-16649?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16649: ------------------------------------ Attachment: HBASE-16649-v2.patch > Truncate table with splits preserved can cause both data loss and truncat= ed data appeared again > -------------------------------------------------------------------------= ---------------------- > > Key: HBASE-16649 > URL: https://issues.apache.org/jira/browse/HBASE-16649 > Project: HBase > Issue Type: Bug > Affects Versions: 1.1.3 > Reporter: Allan Yang > Assignee: Matteo Bertozzi > Attachments: HBASE-16649-v0.patch, HBASE-16649-v1.patch, HBASE-16= 649-v2.patch > > > Since truncate table with splits preserved will delete hfiles and use the= previous regioninfo. It can cause odd behaviors > - Case 1: *Data appeared after truncate* > reproduce procedure=EF=BC=9A > 1. create a table, let's say 'test' > 2. write data to 'test', make sure memstore of 'test' is not empty > 3. truncate 'test' with splits preserved > 4. kill the regionserver hosting the region(s) of 'test' > 5. start the regionserver, now it is the time to witness the miracle! the= truncated data appeared in table 'test' > - Case 2: *Data loss* > reproduce procedure: > 1. create a table, let's say 'test' > 2. write some data to 'test', no matter how many > 3. truncate 'test' with splits preserved > 4. restart the regionserver to reset the seqid > 5. write some data, but less than 2 since we don't want the seqid to run = over the one in 2 > 6. kill the regionserver hosting the region(s) of 'test' > 7. restart the regionserver. Congratulations! the data writen in 4 is now= all lost > *Why?* > for case 1 > Since preserve splits in truncate table procedure will not change the reg= ioninfo, when log replay happens, the 'unflushed' data will restore back to= the region > for case 2 > since the flushedSequenceIdByRegion are stored in Master in a map with th= e region's encodedName. Although the table is truncated, the region's name = is not changed since we chose to preserve the splits. So after truncate the= table, the region's sequenceid is reset in the regionserver, but not reset= in master. When flush comes and report to master, master will reject the u= pdate of sequenceid since the new one is smaller than the old one. The same= happens in log replay, all the edits writen in 4 will be skipped since the= y have a smaller seqid -- This message was sent by Atlassian JIRA (v6.3.4#6332)