Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86EAA9A0D for ; Fri, 3 Aug 2012 20:05:45 +0000 (UTC) Received: (qmail 84935 invoked by uid 500); 3 Aug 2012 20:05:43 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 84803 invoked by uid 500); 3 Aug 2012 20:05:43 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 84781 invoked by uid 99); 3 Aug 2012 20:05:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 20:05:43 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_FREEMAIL_1,FSL_FREEMAIL_2,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.48] (HELO nm13-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.48) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 03 Aug 2012 20:05:33 +0000 Received: from [98.138.90.55] by nm13.bullet.mail.ne1.yahoo.com with NNFMP; 03 Aug 2012 20:05:11 -0000 Received: from [98.138.87.12] by tm8.bullet.mail.ne1.yahoo.com with NNFMP; 03 Aug 2012 20:05:11 -0000 Received: from [127.0.0.1] by omp1012.mail.ne1.yahoo.com with NNFMP; 03 Aug 2012 20:05:11 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 907979.74971.bm@omp1012.mail.ne1.yahoo.com Received: (qmail 49457 invoked by uid 60001); 3 Aug 2012 20:05:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1344024311; bh=1CmxkMYDSZhcKek/YgCKYJ3tPqszG2hScWkqcSV5+xg=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=0Y+uFhKigt8V/4nzJoV1oxWlrAFw924c8ZPxl60+HxOzkvcouPBlS0BvpsJIfsTmn9HM9V0Y4yZLtUh1gW1QcZkIM5hvG0CTVhcH1g6PqdN7ihkMjqzl0up0nk9MhebCfcE0gEsGBYkE8c75p7okG3ehXDQCvtIWoRenEzFUdoc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=4aQ63AWooR4Dhd+w6h2ja0w1AEB/mnJ+SzqjqGBMMasphKgsDZdM7aocInGKzP6kNPGHRfEvTDR9Sd6kx4hNS/McMFsNFBc6khXKRZhV6N0gBEMOMPXCvxQXz88JAUMPOCXyVG5iXSZlNeO4aXRiu+H9hq1GUdBO2PwBiYKOiqU=; X-YMail-OSG: LNs8elsVM1lWkdXE75xMoS9b3q3sIcTGvvNMXRvXznm.Mow 8cQycdcnldJOmdvxy9MbCvnu_wWHy2jh2copnYQb6d9dKwMkvuO1Z6PLu1gV jBmRV9TZ6jYA_ZoFSIPPfRZArSOrf0XLVF8b4VE._VSBFRHMAVVtyhU4t5jm SWhGdJ8Bsmxz3mUOssj1RB8OSUNCjFERSQZurdL3RBBAfWD1iFVQoPBQwwAL DocBJOZoAiN0YB4txP_lpoUbQYL84cy.G0yAWmHiwm26fmj1NU3NlKIrZy87 Lpfra6g.jo2UfxYhnWcfrwbwSCTFIMCtuSNxbUA1YRkoOXcChnpQo8N_vkDC CznmpRakxLA66qug9VgNJEDKfta7at8EA4D0C5c7lq.x4qJnmlLIFsxxq.g5 G1nk3_6Yi1UUtGunZj6y.qwBAa7DdpMaP4ncMzdNvreBB6iMyImNDWAiFms1 2xVFlmiCxaJBP5mF0N93P2WH7w0JJQ7Yn7jm4zTtrDqolfLM1eqG8WGsd8yO BoKrT7kTFRFJK7.6Wv1yP8L5kDS5Xw5pY1Ll90XBaM8BpyUHUCi0gF4i_.mp 9RQuFN6KKzsRKIZYB_D6NT5SIgrfYVGnYMQmQlaI3nrXUMrqwIYzYRD0- Received: from [204.14.239.221] by web121705.mail.ne1.yahoo.com via HTTP; Fri, 03 Aug 2012 13:05:11 PDT X-Mailer: YahooMailWebService/0.8.120.356233 References: <1343950411.35583.YahooMailNeo@web121703.mail.ne1.yahoo.com> <1344021103.51205.YahooMailNeo@web121704.mail.ne1.yahoo.com> Message-ID: <1344024311.28801.YahooMailNeo@web121705.mail.ne1.yahoo.com> Date: Fri, 3 Aug 2012 13:05:11 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: memstore timestamp and visible timestamp To: "dev@hbase.apache.org" Cc: "hbase-dev@hadoop.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable "=A0 I assume that this applies ONLY when we talk about two KVs in the SAM= E =0Arow?"=0APossibly... Would need to look at the code a bit closer. Since= HBase only makes ACID inside a row it should not matter.=0A(Well that is e= xcept for the work I did in HBASE-5229).=0A=0A-- Lars=0A=0A=0A----- Origina= l Message -----=0AFrom: Wei Tan =0ATo: dev@hbase.apache.or= g=0ACc: "hbase-dev@hadoop.apache.org" =0ASent:= Friday, August 3, 2012 12:43 PM=0ASubject: Re: memstore timestamp and visi= ble timestamp=0A=0AHi Lars,=0A=0A=A0 "Since the region server also hands o= ut the TSs based on wall clock =0Atime (and assuming time does not go backw= ards) it follows that a KV =0Aassigned a later memTS cannot have an earlier= TS."=0A=0A=A0 I assume that this applies ONLY when we talk about two KVs = in the SAME =0Arow? I read the code of put() finding that a row is locked e= ntering a put, =0Aand then TS assigned, and later memTS assigned. This make= s sense since =0Aonly after this put is done can another put obtain the row= lock, and =0Atherefore a larger TS and memTS will be obtained. =0A=0A=A0 H= owever, this does NOT hold for two KVs who belong to different rows, =0Arig= ht? Say we have two KVs,=A0 KV1 can enter the put earlier and get a =0Asmal= ler TS1, but it can be delayed a little bit in the code path, and =0Apossib= ly get a memTS after KV2, correct?=0A=0A=A0 Again, thanks :-)=0A=0ABest Reg= ards,=0AWei=0A=0AWei Tan =0AResearch Staff Member =0AIBM T. J. Watson Resea= rch Center=0A19 Skyline Dr, Hawthorne, NY=A0 10532=0Awtan@us.ibm.com; 914-7= 84-6752=0A=0A=0A=0AFrom:=A0 lars hofhansl =0ATo:=A0 = =A0 "dev@hbase.apache.org" , =0ACc:=A0 =A0 "hbase-d= ev@hadoop.apache.org" =0ADate:=A0 08/03/2012 = 03:14 PM=0ASubject:=A0 =A0 =A0 =A0 Re: memstore timestamp and visible times= tamp=0A=0A=0A=0AI see. This is not as much a stated guarantee but a fact fo= llowing from =0Athe implementation.=0A=0A=0AThe memTS is handed out per reg= ion server - which is fine, because the =0Aonly consistency guarantee HBase= makes is for KVs of the same row,=0Aand these are always colocated in the = same region (and hence the same =0Aregion server).=0ASince the region serve= r also hands out the TSs based on wall clock time =0A(and assuming time doe= s not go backwards) it follows that a KV assigned a =0Alater memTS cannot h= ave an earlier TS.=0A=0AOf course that is not the case if you use client as= signed TSs.=0A=0AMaybe I should write a followup blog post that more clearl= y describes the =0Arelationship (or rather the absence thereof) between the= memTS and the TS.=0A=0A=0AThe gist is that the memTS is strictly internal = to guarantee ACID =0Aproperties (and HBase could have used readlocks for th= is as well, and if =0Ait did that would be transparent to the outside),=0Aw= hereas the TS is an application level concept, it is part of the data (so = =0Ato speak).=0A=0A=0A-- Lars=0A________________________________=0AFrom: We= i Tan =0ATo: dev@hbase.apache.org =0ACc: "hbase-dev@hadoop= .apache.org" =0ASent: Friday, August 3, 2012 = 7:21 AM=0ASubject: Re: memstore timestamp and visible timestamp=0A=0AHi Lar= s,=0A=0A=A0 Appreciate your reply. Actually I read your blog posting and t= hen had =0Athat question. I am very interested in how you guarantee this:= =0A=0A=A0 Also note that if you use the Region Server assigned TSs then mT= S1=0ATo:= =A0 =A0 "dev@hbase.apache.org" , =0A"hbase-dev@hadoo= p.apache.org" , =0ADate:=A0 08/02/2012 07:35 = PM=0ASubject:=A0 =A0 =A0 =A0 Re: memstore timestamp and visible timestamp= =0A=0A=0A=0AHi Wei,=0A=0Ayou have to distinguish between "visible to other = concurrent scanners" and =0A=0A"visible to a client".=0AWhat's visible to a= client is determined by what the a client wants to see =0A=0Abased on the = application visible timestamp (TS).=0A=0AThe visibility to concurrent scann= ers is controlled by the memstoreTS =0A(mTS) to avoid "strange" states sue = to parallel updates.=0AHBase here guards against partially visible "transac= tions" (i.e. a Put of =0Amany columns that fails after it applied the chang= es to some of the =0Acolumns).=0A=0AThe scenario you describe below is inde= ed desired. Note that a client can =0Arequest seeing the older versions too= so the older edit (in terms of TS is =0A=0Anot lost).=0AAlso note that if = you use the Region Server assigned TSs then mTS1=0ATo: hbase-dev@hadoop.apache.org=0ACc: =0ASent: Thursday, August 2,= 2012 3:35 PM=0ASubject: memstore timestamp and visible timestamp=0A=0AHi,= =0A=0A=A0 I have a question regarding the correlation between the visible = =0Atimestamp of a KV (denoted as ts) and its memstore timestamp (aka, the = =0Awrite number, denoted as memts). Reading the HRegion.java code it seems = =0Athat these two are independently assigned. Let's assume two concurrent = =0Aput: (k, v1) and (k, v2)=0A=0A=0A=A0 Suppose somehow memts(k,v1) < memts= (k, v2) then (k,v1) will be committed =0A=0A=0Aand visible before (k,v2). = =0AIf ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the= =0Alatest version.=0Aelse, if ts(k,v1) > ts(k, v2), then after a "later"(w= .r.t. MVCC) KV =0Acommits, it immediately become stale and still not visibl= e. --- Is it a =0Adesirable feature?=0A=0A=0A=A0 Am I understanding it corr= ectly, that memts(k,v1) < memts(k, v2) does =0Anot indicate that ts(k,v1) <= ts(k, v2), and vice versa? =0APS: let's talk about the hbase region server= assigned, not user assigned, =0Avisible timestamp.=0A=0A=A0 Thanks,=0A=0AW= ei=0A=0ABest Regards,=0AWei=0A=0AWei Tan =0AResearch Staff Member =0AIBM T.= J. Watson Research Center=0A19 Skyline Dr, Hawthorne, NY=A0 10532=0Awtan@u= s.ibm.com; 914-784-6752