hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6093) Flatten timestamps during flush and compaction
Date Tue, 29 May 2012 23:52:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285267#comment-13285267
] 

Matt Corgan commented on HBASE-6093:
------------------------------------

oops - for flushes you would set all timestamps to the flush start time like i said above.
 But for compactions you would would set all timestamps to the earliest timestamp in the compaction,
and ensure that only consecutive files get compacted together
                
> Flatten timestamps during flush and compaction
> ----------------------------------------------
>
>                 Key: HBASE-6093
>                 URL: https://issues.apache.org/jira/browse/HBASE-6093
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>            Priority: Minor
>
> Many applications run with maxVersions=1 and do not care about timestamps, or they will
specify one timestamp per row as a normal KeyValue rather than per-cell.
> Then, DataBlockEncoders like those in HBASE-4218 and HBASE-4676 often encode timestamps
as diffs from the previous or diffs from the minimum timestamp in the block.  If all timestamps
in a block are the same, they will all compress to basically <= 8 bytes total per block.
 This can be 10% to 25% space savings for some schemas, and that savings is realized both
on disk and in block cache.
> We could add a ColumnFamily setting flattenTimestamps=[true/false].  If true, then all
timestamps are modified during a flush/compaction to the currentTimeMillis() at the start
of the flush/compaction.  If all timestamps are made identical in a file, then the encoder
will be able to eliminate them.
> The simplest use case is probably that where all inserts are type=Put, there are no overwrites,
and there are no deletes.  As use cases get more complex, then so does the implementation.
 
> For example, what happens when there is a Put and a Delete of the same cell in the same
memstore?  Maybe for a flush at t=flushStartTime, the Put gets timestamp=t, and the Delete
gets timestamp=t+1.  Or maybe HBASE-4241 could take care of this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message