Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 87641 invoked from network); 26 Jan 2011 16:30:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Jan 2011 16:30:10 -0000 Received: (qmail 5725 invoked by uid 500); 26 Jan 2011 16:30:10 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 5620 invoked by uid 500); 26 Jan 2011 16:30:08 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 5611 invoked by uid 99); 26 Jan 2011 16:30:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 16:30:07 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 16:30:05 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p0QGThAO008646 for ; Wed, 26 Jan 2011 16:29:44 GMT Message-ID: <19089407.218911296059383676.JavaMail.jira@thor> Date: Wed, 26 Jan 2011 11:29:43 -0500 (EST) From: "Kannan Muthukkaruppan (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery In-Reply-To: <4326013.210121296031617614.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3481?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1298= 7083#action_12987083 ]=20 Kannan Muthukkaruppan commented on HBASE-3481: ---------------------------------------------- Dug some more... the bad MAX_SEQ_ID that the HFile got stamped with corresp= onds to an edit that looks special. {code} {code} Another interesting thing is that in the .logs, this seq id appears out o= f order... {code} ... #1081, pos=3D6403894 kannan1/kannan1,,1296023211936.dfe45b39b1d8c360b8e7f55= 4b576ab17./71170147=3D[#edits: 18 =3D <0992bdb67af8d2b2172618496243433c:287= 686/actions:0\/1296023308555/Put/vlen=3D597; 0992bdb67af8d2b217261849624343= 3c:287686/actions:1/1296023308555/Put/vlen=3D416; 0992bdb67af8d2b2172618496= 243433c:287686/actions:2\/1296023308555/Put/vlen=3D436; 0992bdb67af8d2b2172= 618496243433c:287686/actions:3/1296023308555/Put/vlen=3D329; 0992bdb67af8d2= b2172618496243433c:287686/actions:4\/1296023308555/Put/vlen=3D256; 0992bdb6= 7af8d2b2172618496243433c:287686/actions:5/1296023308555/Put/vlen=3D792; 099= 2bdb67af8d2b2172618496243433c:287686/actions:6\/1296023308555/Put/vlen=3D23= 1; 0992bdb67af8d2b2172618496243433c:287686/actions:7/1296023308555/Put/vlen= =3D790; 0992bdb67af8d2b2172618496243433c:287686/actions:8\/1296023308555/Pu= t/vlen=3D800; 0992bdb67af8d2b2172618496243433c:287686/actions:9/12960233085= 55/Put/vlen=3D664; 0992bdb67af8d2b2172618496243433c:287686/actions:10\/1296= 023308555/Put/vlen=3D618; 0992bdb67af8d2b2172618496243433c:287686/actions:1= 1/1296023308555/Put/vlen=3D533; 0992bdb67af8d2b2172618496243433c:287686/act= ions:12\/1296023308555/Put/vlen=3D636; 0992bdb67af8d2b2172618496243433c:287= 686/actions:13/1296023308555/Put/vlen=3D554; 0992bdb67af8d2b217261849624343= 3c:287686/actions:14\/1296023308555/Put/vlen=3D353; 0992bdb67af8d2b21726184= 96243433c:287686/actions:15/1296023308555/Put/vlen=3D307; 0992bdb67af8d2b21= 72618496243433c:287686/actions:16\/1296023308555/Put/vlen=3D410; 0992bdb67a= f8d2b2172618496243433c:287686/actions:17/1296023308555/Put/vlen=3D579; >] #1082, pos=3D6404082 kannan1/kannan1,bbbbbbbb,1296023211937.0a3f2362b0264b0= 192c311f10c006d84./71169066{=3D[#edits: 1 =3D ] #1083, pos=3D6406845 kannan1/kannan1,99999999,1296023211936.8ff83822490b1f8= 175dc16aaf4c170a6./71170148=3D[#edits: 4 =3D ] {code} The flushed file's MAX_SEQ_ID is recorded as: 71169066 (corresponding to en= try #1082 above). However that edit is not a regular put key, but this spe= cial row ] Also, notice out the sequence ids for edits surrounding it(entries #1081 & = #1083) are both higher (namely, 71170147 & 71170148). Looks like this sequence id is being obtained outside of the updatesLock. B= ut haven't confirmed that in the code. > max seq id in flushed file can be larger than its correct value causing d= ata loss during recovery > -------------------------------------------------------------------------= ------------------------ > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Priority: Critical > > [While doing some cluster kill tests, I noticed some missing data after l= og recovery. Upon investigating further, and pretty printing contents of HF= iles and recovered logs, this is my analysis of the situation/bug. Please c= onfirm the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile sh= ould be the max sequence id of all KVs in the memstore. However, we seem to= simply obtain the current sequence id from the HRegion, and stamp the HFil= e's MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId =3D (wal =3D=3D null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply:=20 > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in = pseudo-code) > {code} > 1. HLog.append(); > 1.1 obtainSeqNum() > 1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented t= o say 15 if there are 5 more outstanding puts. Say the writeToWAL() is stil= l in progress for these puts. In this case, none of these edits (11..15) wo= uld have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record= its MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what = obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain= the data for edits 11..15. > Now, at this point if the region server were to crash, and we run log rec= overy, the splits all go through correctly, and a correct recovered.edits f= ile is generated with the edits 11..15.=20 > Next, when the region is opened, the HRegion notes that one of the store = file says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, i= t skips replaying edits 11..15. Or in other words, data loss. > ---- --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.