Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9F8F76F7 for ; Sat, 30 Jul 2011 11:39:40 +0000 (UTC) Received: (qmail 79109 invoked by uid 500); 30 Jul 2011 11:39:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 78830 invoked by uid 500); 30 Jul 2011 11:39:37 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 78796 invoked by uid 99); 30 Jul 2011 11:39:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Jul 2011 11:39:33 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Jul 2011 11:39:31 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id DDCC5958C9 for ; Sat, 30 Jul 2011 11:39:09 +0000 (UTC) Date: Sat, 30 Jul 2011 11:39:09 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <128111895.20786.1312025949905.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1383407381.16041.1304366464597.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073159#comment-13073159 ] Ted Yu commented on HBASE-3845: ------------------------------- Applied to TRUNK. TestResettingCounters passes now. Thanks for the patch Anirudh. > data loss because lastSeqWritten can miss memstore edits > -------------------------------------------------------- > > Key: HBASE-3845 > URL: https://issues.apache.org/jira/browse/HBASE-3845 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.3 > Reporter: Prakash Khemani > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.90.5 > > Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch > > > (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) > In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. > After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. > HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. > Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. > step 1: > flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). > step 2 : > as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. > step 3 : > wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. > step 4: > the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. > == > as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira