hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps
Date Thu, 25 Feb 2010 15:44:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838382#action_12838382

Todd Lipcon commented on HBASE-2265:

bq. With a big spread of timestamps and keys, we wouldnt get much of an optimization

Exactly. If users are writing out of order, they cannot take advantage of the optimization
of culling older storage. As you mentioned, bloom filters help here. For users who are writing
in order, the performance should be identical today. I think this is exactly what we want.

bq. for a complete column family get, we'll have to touch every file, every time. This is
because you are never sure if the next file contains another key/value for the result. A bloom
filter would help here

Yep, and this is exactly what I would expect. Why should a column family get _not_ touch all
of the files?

bq. However, during a compaction, this information is collapsed, and we end up with the duplicate
key/values sitting next to each other. We might be able to cause/create an invariant that
during compaction the 'newer' one comes first

It's probably worth getting consensus, but I think it would be acceptable behavior to only
retain the keyval from the newest storage when the timestamps are equal. That is, if I write
A:ts=1, B:ts=2, C:ts=3, D:ts=3, E:ts=3, and want to retain "latest 3", I'd end up getting
writes A, B, and E.

bq. Generally the ideal solution would involve no change to the KeyValue serialization format

I agree, and I think this can be done using only the existing metadata fields without any
change per-keyvalue.

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore
track their maximum and minimum timestamps. This has the following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp X, and X
>= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior
of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior
for those who sometimes write out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide which cell
wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps,
instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was
flushed after storage B.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message