hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: memstore timestamp and visible timestamp
Date Fri, 03 Aug 2012 21:27:22 GMT
Time can go backwards. One time a year. By one hour.
I may be wrong but it seems that the situation described by TS (memTS1 > memTS2 and ts1
< ts2) is possible but under concurrent
updates in a distributed environment the only way to guarantee "fairness" of operations is
to put all of them into one global queue.
I really doubt that this is what people need (and want).


Upd: It is possible to keep a queue per server-row inside RS. This is the question of how
do we define order of requests in
concurrent environment. We can have one global queue, one queue per RS or (at the lowest granularity)
one queue per key-row but
the most efficient way (and of course not the most fair) - add the element of randomness -
let OS decide which thread it will give time slot to first.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com


________________________________________
From: lars hofhansl [lhofhansl@yahoo.com]
Sent: Friday, August 03, 2012 12:11 PM
To: dev@hbase.apache.org
Cc: hbase-dev@hadoop.apache.org
Subject: Re: memstore timestamp and visible timestamp

I see. This is not as much a stated guarantee but a fact following from the implementation.


The memTS is handed out per region server - which is fine, because the only consistency guarantee
HBase makes is for KVs of the same row,
and these are always colocated in the same region (and hence the same region server).
Since the region server also hands out the TSs based on wall clock time (and assuming time
does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier
TS.

Of course that is not the case if you use client assigned TSs.

Maybe I should write a followup blog post that more clearly describes the relationship (or
rather the absence thereof) between the memTS and the TS.


The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could
have used readlocks for this as well, and if it did that would be transparent to the outside),
whereas the TS is an application level concept, it is part of the data (so to speak).


-- Lars
________________________________
From: Wei Tan <wtan@us.ibm.com>
To: dev@hbase.apache.org
Cc: "hbase-dev@hadoop.apache.org" <hbase-dev@hadoop.apache.org>
Sent: Friday, August 3, 2012 7:21 AM
Subject: Re: memstore timestamp and visible timestamp

Hi Lars,

   Appreciate your reply. Actually I read your blog posting and then had
that question. I am very interested in how you guarantee this:

   Also note that if you use the Region Server assigned TSs then mTS1<mTS2
implies TS1<=TS2 (the update might happen with the same ms).

  In case you have a pointer explaining this, I would like to read.
Otherwise I will dig into the code later today. I remember reading 0.92.0
code and do not find much clue. But I will try again.



Best Regards,
Wei

Wei Tan
Research Staff Member
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
wtan@us.ibm.com; 914-784-6752



From:   lars hofhansl <lhofhansl@yahoo.com>
To:     "dev@hbase.apache.org" <dev@hbase.apache.org>,
"hbase-dev@hadoop.apache.org" <hbase-dev@hadoop.apache.org>,
Date:   08/02/2012 07:35 PM
Subject:        Re: memstore timestamp and visible timestamp



Hi Wei,

you have to distinguish between "visible to other concurrent scanners" and
"visible to a client".
What's visible to a client is determined by what the a client wants to see
based on the application visible timestamp (TS).

The visibility to concurrent scanners is controlled by the memstoreTS
(mTS) to avoid "strange" states sue to parallel updates.
HBase here guards against partially visible "transactions" (i.e. a Put of
many columns that fails after it applied the changes to some of the
columns).

The scenario you describe below is indeed desired. Note that a client can
request seeing the older versions too so the older edit (in terms of TS is
not lost).
Also note that if you use the Region Server assigned TSs then mTS1<mTS2
implies TS1<=TS2 (the update might happen with the same ms).

If you do not mind a longer read, I have written about this here:
http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html

Let me know if that makes any sense.

-- Lars


----- Original Message -----
From: Wei Tan <wtan@us.ibm.com>
To: hbase-dev@hadoop.apache.org
Cc:
Sent: Thursday, August 2, 2012 3:35 PM
Subject: memstore timestamp and visible timestamp

Hi,

  I have a question regarding the correlation between the visible
timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the
write number, denoted as memts). Reading the HRegion.java code it seems
that these two are independently assigned. Let's assume two concurrent
put: (k, v1) and (k, v2)


  Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed

and visible before (k,v2).
If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the
latest version.
else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV
commits, it immediately become stale and still not visible. --- Is it a
desirable feature?


  Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does
not indicate that ts(k,v1) < ts(k, v2), and vice versa?
PS: let's talk about the hbase region server assigned, not user assigned,
visible timestamp.

  Thanks,

Wei

Best Regards,
Wei

Wei Tan
Research Staff Member
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
wtan@us.ibm.com; 914-784-6752

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message