hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-613) Timestamp-anchored scanning fails to find all records
Date Fri, 23 May 2008 01:27:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599261#action_12599261
] 

Jim Kellerman commented on HBASE-613:
-------------------------------------

This is ugly.

Because InternalScanner specifies next as:
{code}
public boolean next(HStoreKey key, SortedMap<byte[], byte[]> results)
{code}
it is only possible to return one timestamp for all the results of a row. Consequently, Cells
on the client side are meaningless with respect to timestamp. MemcacheScanner.next sets the
timestamp in the HStoreKey to the timestamp requested when the scanner was created. So if
the timestamp requested when the scanner was created was HConstants.LATEST_TIMESTAMP, that
is what gets set. Similarly, if a specific timestamp was set when the scanner was created,
older entries in the cache may not be found.

So what I propose is to change InternalScanner.next to be:
{code}
public byte[] next(SortedMap<byte[], Cell> results)
{code}
where the return value is the row key or null if there are not results and the results map
key is the column name and value is the Cell (value, timestamp) pair. This should make it
easier to determine what results should be returned.


> Timestamp-anchored scanning fails to find all records
> -----------------------------------------------------
>
>                 Key: HBASE-613
>                 URL: https://issues.apache.org/jira/browse/HBASE-613
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.2.0
>
>         Attachments: TestTimestampScanning.java
>
>
> If I add 3 versions of a cell and then scan across the first set of added cells using
a timestamp that should only get values from the first upload, a bunch are missing (I added
100k on each of the three uploads).  I thought it the fact that we set the number of cells
found back to 1 in HStore when we move off current row/column but that doesn't seem to be
it.  I also tried upping the MAX_VERSIONs on my table and that seemed to have no effect. 
Need to look closer.
> Build a unit test because replicating on cluster takes too much time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message