hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps
Date Wed, 07 Jul 2010 21:28:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886092#action_12886092

HBase Review Board commented on HBASE-2265:

Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

bq.  On 2010-07-07 13:58:43, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java,
line 55
bq.  > <http://review.hbase.org/r/257/diff/2/?file=2159#file2159line55>
bq.  >
bq.  >     I think this information should be maintained in MemStore not inside this data
structure. We might get rid of this data structure type and change to another one day. This
makes it too hard to do that.

When we are flushing the memstore to a storefile, we are passing an object of KeyValueSkipListSet.
This variable goes through several functions before reaching Store. If we don't have TimeRangeTracker
inside KeyValueSkipListSet, we will have to change all flush related functions to take an
extra argument as input. This way, in future, if we decide to send another piece of information,
we will have to add more arguments. Having TimeRangeTracker inside KeyValueSkipListSet lets
us pass the information without changing all flush related functions. Would it still be better
to pass TimeRangeTracker as an additional argument?

- Pranav

This is an automatically generated e-mail. To reply, visit:

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>            Assignee: Pranav Khaitan
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore
track their maximum and minimum timestamps. This has the following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp X, and X
>= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior
of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior
for those who sometimes write out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide which cell
wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps,
instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was
flushed after storage B.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message