hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps
Date Mon, 12 Jul 2010 21:23:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887542#action_12887542
] 

HBase Review Board commented on HBASE-2265:
-------------------------------------------

Message from: "Ryan Rawson" <ryanobjc@gmail.com>


bq.  On 2010-07-12 13:09:10, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 425
bq.  > <http://review.hbase.org/r/257/diff/4/?file=2332#file2332line425>
bq.  >
bq.  >     is this idiomatic? I dont think I've seen this particular pattern before?
bq.  
bq.  Pranav Khaitan wrote:
bq.      This is a utility method in Writables and it does exactly the task we want to get
done here. There is an alternate method getWritable which also returns an object of type Writable.
Do you think it would be better to use that?

ok i see, then this should be ok


bq.  On 2010-07-12 13:09:10, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 933
bq.  > <http://review.hbase.org/r/257/diff/4/?file=2328#file2328line933>
bq.  >
bq.  >     This doesnt appear to be germane to the issue at hand.  It shouldn't appear
in this patch.
bq.  
bq.  Pranav Khaitan wrote:
bq.      This method is required for this issue to check if a delete is a Column of Family
delete. It is used in TimeRangeTracker. If you suggest not adding this method, I can write
this code at the place I am using it at?

ok my bad. this is fine then.


bq.  On 2010-07-12 13:09:10, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java,
line 136
bq.  > <http://review.hbase.org/r/257/diff/4/?file=2333#file2333line136>
bq.  >
bq.  >     This doesnt seem germane to the issue at hand... Can we not do this here?
bq.  
bq.  Pranav Khaitan wrote:
bq.      I did this to pass Timerange information to ShouldSeek(). An alternative possibility
is to pass TimeRange as an additional parameter. We had a discussion about this in our team
here and we thought that since Scan contains all information, it is a neater way of passing
than adding more arguments. That way, in future if we add any more filtering techniques at
this level, we wont have to change the interface.
bq.      
bq.      Do you suggest making it two different functions: shouldSeek(row, columns) and shouldSeek(timeRange)
?
bq.      
bq.      Alternately, it can be combined into one function like shouldSeek(isGet, row, columns,
timeRange)

ok, lets keep it as then.


- Ryan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/257/#review357
-----------------------------------------------------------





> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>            Assignee: Pranav Khaitan
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore
track their maximum and minimum timestamps. This has the following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp X, and X
>= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior
of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior
for those who sometimes write out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide which cell
wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps,
instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was
flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message