hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-613) Timestamp-anchored scanning fails to find all records
Date Mon, 26 May 2008 22:47:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599925#action_12599925
] 

Jim Kellerman commented on HBASE-613:
-------------------------------------

Demonstrating the bug:

On an unpatched trunk, run Performance evaluation:
{code}
$ hadoop-0.17.0/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
{code}

Apply Timestamp.patch (to get the test program), ant compile-test and run it:
{code}
$ hbase/bin/hbase org.apache.hadoop.hbase.Timestamp time
latest timestamp: 9223372036854775807
{code}

Note that the timestamp returned is the value of HConstants.LATEST_TIMESTAMP

Counting the number of rows with the returned value returns the correct result:
{code}
$ hbase/bin/hbase org.apache.hadoop.hbase.Timestamp count 9223372036854775807
number of rows: 1048576
{code}

of course that is not really the timestamp of the most recent row inserted. So shut down hbase
and restart it (and allow it to settle down wrt region balancing), this flushes the caches
and may cause compactions on the restart.
{code}
$ hbase/bin/stop-hbase.sh 
stopping master.......................
$ hbase/bin/start-hbase.sh 
starting master, logging to /bfd/jim/hbase/logs/hbase-jim-master-xx.foo.com.out
xx.foo.com: starting regionserver, logging to /bfd/jim/hbase/logs/hbase-jim-regionserver-xx.foo.com.out
yy.foo.com: starting regionserver, logging to /bfd/jim/hbase/logs/hbase-jim-regionserver-yy.foo.com.out
zz.foo.com: starting regionserver, logging to /bfd/jim/hbase/logs/hbase-jim-regionserver-zz.foo.com.out
vv.foo.com: starting regionserver, logging to /bfd/jim/hbase/logs/hbase-jim-regionserver-vv.foo.com.out
{code}

Running the program to get the timestamp of the latest cell inserted we get:
{code}
$ hbase/bin/hbase org.apache.hadoop.hbase.Timestamp time
latest timestamp: 1211839273332
{code}

a much more reasonable value. Even counting the number of rows with this timestamp works properly:
{code}
$ hbase/bin/hbase org.apache.hadoop.hbase.Timestamp count 1211839273332
number of rows: 1048576
{code}

If we run the PerformanceEvaluation test again (without shutting down or re-initializing the
table), we get the wrong number of rows for the original timestamp:
{code}
$ hbase/bin/hbase org.apache.hadoop.hbase.Timestamp count 1211839273332
number of rows: 224384
{code}

and the value of the latest timestamp is:
{code}
$ hbase/bin/hbase org.apache.hadoop.hbase.Timestamp time
latest timestamp: 9223372036854775807
{code}



> Timestamp-anchored scanning fails to find all records
> -----------------------------------------------------
>
>                 Key: HBASE-613
>                 URL: https://issues.apache.org/jira/browse/HBASE-613
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.2.0
>
>         Attachments: TestTimestampScanning.java, Timestamp.patch
>
>
> If I add 3 versions of a cell and then scan across the first set of added cells using
a timestamp that should only get values from the first upload, a bunch are missing (I added
100k on each of the three uploads).  I thought it the fact that we set the number of cells
found back to 1 in HStore when we move off current row/column but that doesn't seem to be
it.  I also tried upping the MAX_VERSIONs on my table and that seemed to have no effect. 
Need to look closer.
> Build a unit test because replicating on cluster takes too much time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message