hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: incrementColumnValue, WAL and timestamp
Date Wed, 12 Jan 2011 20:32:44 GMT
That code was written to handle the pathological case where the same
timestamp would be in memstore and hfile and we'd get a bad read and
base counter updates on the wrong value.  That is we'd have 2 KVs:
KV1 ts=1  value=1500
KV2 ts=1  value=500

and we'd base the increment on KV2, thus 'losing' 1000 increments.
With seqid we can determine that KV1 > KV2 even though ts=1 for both
and resolve without doing the Wrong Thing.  That we don't use the same
KV is because you don't want to mutate a KV that someone else in a
different thread (log flushing) might depend on.  Unfortunately
updating timestamp which is a long isnt atomic in KV, since you are
updating 8 byte fields one at a time.

Increment has always been atomic, and making multi increment atomic
will be a little trickier, the pseudo code is:
- beginMemstoreInsert(seqid);
- store.memstore.add(kv);
- commitMemstoreInsert(seqid);
- store.memstore.remove_versions(kv);

the last being necessary to reduce version bloat.

On Wed, Jan 12, 2011 at 12:14 PM, Stack <stack@duboce.net> wrote:
> On Wed, Jan 12, 2011 at 12:02 PM, Jonathan Gray <jgray@fb.com> wrote:
>> I think there are some hacky fixes in there to prevent duplicate timestamps.
> Ok.
>> With the new seqid, this should not be an issue.  I think we should then be able
to generate a single now() value and use it everywhere (MemStore, HLog).  Duplicate timestamps
won't be a problem as the seqid ordering will take care of it and we can still use the same
logic to cleanup old values.
> Right.
> OK if I strip back all I regard as 'workaround' code as part of my
> first cut at 2856?
>> A separate issue is how "atomic" we want counters to be.  Currently the multi-column
Increment operation does not utilize any of the RWCC stuff.  I guess here we should just
use seqid now?  Is it one seqid per KV or per "row transaction"?
> I chatted w/ Ryan on this yesterday.  For single increment, it uses
> RWCC.  We'd need to change the ordering of RWCC updates a little so
> that the increment was immediately available (previous we had a
> memstoreTs set to 0 as a means for doing this -- this goes away now).
> I'll have to look at the multi-increment.  Sounds like it should go
> under RWCC controls.
> Currently planning one sequence number per batch of edits, per write to WAL.
> St.Ack

View raw message