hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rod Cope <rod.c...@openlogic.com>
Subject Problem with flushing and identical timestamps
Date Tue, 26 Jan 2010 17:36:24 GMT
Hi,

I¹m seeing behavior on 0.20.2 and 0.20.3 that doesn¹t seem quite right and
would like to know if this is by design, a bug, or something I¹m doing
wrong.

Background:

When I do a put that includes a timestamp like this (conceptually ­ I know
this is not the actual API), it works just fine.
  put ³table², ³family², ³column², ³bbb², 12345

Then, if I do another put in the same client code using the same timestamp
like this...
  put ³table², ³family², ³column², ³aaa², 12345

...and I create a scanner, grab a Result, and iterate over all values using
list(), I get this...
  ³table², ³family², ³column², ³aaa², 12345

So far, so good.  Now, if I truncate the table from the shell and run a new
program that does a flush() on the table between the two put¹s, but does it
in the same client program back-to-back, I also get the same results from
list().

-----

Problem:

Here¹s where the trouble starts.  I truncate the table and run a new program
that puts ³bbb², flushes the table, and quits.  Here¹s what I get from
list():
  ³table², ³family², ³column², ³bbb², 12345

Then I run another program that puts ³aaa², flushes, and quits.  Here¹s what
I get from list():
  ³table², ³family², ³column², ³aaa², 12345
  ³table², ³family², ³column², ³bbb², 12345

And if I then run a third program that puts ³ccc², flushes, and quits, I get
this from list():
  ³table², ³family², ³column², ³ccc², 12345
  ³table², ³family², ³column², ³bbb², 12345
  ³table², ³family², ³column², ³aaa², 12345

I¹m getting three different values for identical
table/family/qualifier/timestamp tuples.  Does this seem right?  There also
doesn¹t seem to be a defined sort order, probably because the timestamps are
identical.

Also, if instead of using list(), I use getMap(), then I always only get a
single result.  The single result is always the last item in the lists above
(i.e., ³bbb² then ³bbb² then ³aaa²).  I get identical results from using
getNoVersionMap().

I suspect that this same behavior could occur when HBase decides to flush on
its own, but I could be wrong.  As you can imagine, this can cause problems
because clients can¹t know from the results of calling list() which value is
³right² or ³newest².  They also can¹t rely on getMap() or getNoVersionMap()
because the single result that gets returned is not necessarily ³right² or
³newest².

I¹ve reproduced everything above in a stand-alone installation and also with
a 7 regionserver cluster with the final 0.20.3.  I started down this
debugging path originally because I ran into this problem on the 7
regionserver cluster with one table of 100+ regions.  I was flushing
programmatically at the end of some large imports because I'm doing
setWriteToWAL(false) for load performance.

Am I doing something wrong?  Did I miss an HBase assumption about flushing
and/or identical timestamps?

Any help would be much appreciated.

Thanks,
Rod

--

Rod Cope
CTO & Founder
OpenLogic, Inc.


Mime
View raw message