Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <19089407.218911296059383676.JavaMail.jira@thor>
Date: Wed, 26 Jan 2011 11:29:43 -0500 (EST)
From: "Kannan Muthukkaruppan (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Subject: [jira] Commented: (HBASE-3481) max seq id in flushed file can be
 larger than its correct value causing data loss during recovery
In-Reply-To: <4326013.210121296031617614.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-3481?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1298=
7083#action_12987083 ]=20

Kannan Muthukkaruppan commented on HBASE-3481:
----------------------------------------------

Dug some more... the bad MAX_SEQ_ID that the HFile got stamped with corresp=
onds to an edit that looks special.

{code}
<METAROW/METAFAMILY:/1296023308559/Put/vlen=3D17; >
{code}

Another interesting thing is that in the  .logs, this seq id  appears out o=
f order...

{code}
...
#1081, pos=3D6403894 kannan1/kannan1,,1296023211936.dfe45b39b1d8c360b8e7f55=
4b576ab17./71170147=3D[#edits: 18 =3D <0992bdb67af8d2b2172618496243433c:287=
686/actions:0\/1296023308555/Put/vlen=3D597; 0992bdb67af8d2b217261849624343=
3c:287686/actions:1/1296023308555/Put/vlen=3D416; 0992bdb67af8d2b2172618496=
243433c:287686/actions:2\/1296023308555/Put/vlen=3D436; 0992bdb67af8d2b2172=
618496243433c:287686/actions:3/1296023308555/Put/vlen=3D329; 0992bdb67af8d2=
b2172618496243433c:287686/actions:4\/1296023308555/Put/vlen=3D256; 0992bdb6=
7af8d2b2172618496243433c:287686/actions:5/1296023308555/Put/vlen=3D792; 099=
2bdb67af8d2b2172618496243433c:287686/actions:6\/1296023308555/Put/vlen=3D23=
1; 0992bdb67af8d2b2172618496243433c:287686/actions:7/1296023308555/Put/vlen=
=3D790; 0992bdb67af8d2b2172618496243433c:287686/actions:8\/1296023308555/Pu=
t/vlen=3D800; 0992bdb67af8d2b2172618496243433c:287686/actions:9/12960233085=
55/Put/vlen=3D664; 0992bdb67af8d2b2172618496243433c:287686/actions:10\/1296=
023308555/Put/vlen=3D618; 0992bdb67af8d2b2172618496243433c:287686/actions:1=
1/1296023308555/Put/vlen=3D533; 0992bdb67af8d2b2172618496243433c:287686/act=
ions:12\/1296023308555/Put/vlen=3D636; 0992bdb67af8d2b2172618496243433c:287=
686/actions:13/1296023308555/Put/vlen=3D554; 0992bdb67af8d2b217261849624343=
3c:287686/actions:14\/1296023308555/Put/vlen=3D353; 0992bdb67af8d2b21726184=
96243433c:287686/actions:15/1296023308555/Put/vlen=3D307; 0992bdb67af8d2b21=
72618496243433c:287686/actions:16\/1296023308555/Put/vlen=3D410; 0992bdb67a=
f8d2b2172618496243433c:287686/actions:17/1296023308555/Put/vlen=3D579; >]

#1082, pos=3D6404082 kannan1/kannan1,bbbbbbbb,1296023211937.0a3f2362b0264b0=
192c311f10c006d84./71169066{=3D[#edits: 1 =3D <METAROW/METAFAMILY:/12960233=
08559/Put/vlen=3D17; >]

#1083, pos=3D6406845 kannan1/kannan1,99999999,1296023211936.8ff83822490b1f8=
175dc16aaf4c170a6./71170148=3D[#edits: 4 =3D <a2adc743be4a350b7009af5143acf=
bf3:287685/actions:0/1296023308555/Put/vlen=3D552; a2adc743be4a35\
0b7009af5143acfbf3:287685/actions:1/1296023308555/Put/vlen=3D715; a2adc743b=
e4a350b7009af5143acfbf3:287685/actions:2/1296023308555/Put/vlen=3D806; a2ad=
c743be4a350b7009af5143acfbf3:287685/actions:3/1296023308555/P\
ut/vlen=3D296; >]
{code}


The flushed file's MAX_SEQ_ID is recorded as: 71169066 (corresponding to en=
try #1082 above). However that edit is not a regular put  key, but this spe=
cial row <METAROW/METAFAMILY:/1296023308559/Put/vlen=3D17; >]

Also, notice out the sequence ids for edits surrounding it(entries #1081 & =
#1083) are both higher (namely, 71170147 & 71170148).

Looks like this sequence id is being obtained outside of the updatesLock. B=
ut haven't confirmed that in the code.


> max seq id in flushed file can be larger than its correct value causing d=
ata loss during recovery
> -------------------------------------------------------------------------=
------------------------
>
>                 Key: HBASE-3481
>                 URL: https://issues.apache.org/jira/browse/HBASE-3481
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Priority: Critical
>
> [While doing some cluster kill tests, I noticed some missing data after l=
og recovery. Upon investigating further, and pretty printing contents of HF=
iles and recovered logs, this is my analysis of the situation/bug. Please c=
onfirm the theory and pitch in with suggestions.]
> When memstores are flushed, the max sequence id recorded in  the HFile sh=
ould be the max sequence id of all KVs in the memstore. However, we seem to=
 simply obtain the current sequence id from the HRegion, and stamp the HFil=
e's MAX_SEQ_ID with it.
> From HRegion.java:
> {code}
>     sequenceId =3D (wal =3D=3D null)? myseqid: wal.startCacheFlush();
> {code}
> where, startCacheFlush() is:
> {code}
> public long startCacheFlush() {
>     this.cacheFlushLock.lock();
>     return obtainSeqNum();
>  }
> {code}
> where, obtainSeqNum() is simply:=20
> {code}
> private long obtainSeqNum() {
>     return this.logSeqNum.incrementAndGet();
>   }
> {code}
> So let's say a memstore contains edits with sequence number 1..10.
> Meanwhile, say more Puts come along, and are going through this flow (in =
pseudo-code)
> {code}
> 1. HLog.append();
>        1.1  obtainSeqNum()
>        1.2 writeToWAL()
> 2 updateMemStore()
> {code}
> So it is possible that the sequence number has already been incremented t=
o say 15 if there are 5 more outstanding puts. Say the writeToWAL() is stil=
l in progress for these puts. In this case, none of these edits (11..15) wo=
uld have been written to memstore yet.
> At this point if a cache flush of the memstore happens, then we'll record=
 its MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what =
obtainSeqNum() would return as the next sequence number to use, right?).
> Assume that the edits 11..15 eventually complete. And so HLogs do contain=
 the data for edits 11..15.
> Now, at this point if the region server were to crash, and we run log rec=
overy, the splits all go through correctly, and a correct recovered.edits f=
ile is generated with the edits 11..15.=20
> Next, when the region is opened, the HRegion notes that one of the store =
file says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, i=
t  skips replaying edits 11..15. Or in other words, data loss.
> ----

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.