hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5263) Preserving cached data on compactions through cache-on-write
Date Fri, 08 Aug 2014 18:51:13 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-5263:

    Component/s: Performance

> Preserving cached data on compactions through cache-on-write
> ------------------------------------------------------------
>                 Key: HBASE-5263
>                 URL: https://issues.apache.org/jira/browse/HBASE-5263
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Compaction, Performance
>            Reporter: Mikhail Bautin
>            Assignee: Rishit Shroff
>            Priority: Minor
> We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the block cache
on compactions if cache-on-write is enabled. However, it would be ideal to reduce the effect
compactions have on the cached data. For every block we are writing for a compacted file we
can decide whether it needs to be cached based on whether the original blocks containing the
same data were already in cache. More precisely, for every HFile reader in a compaction we
can maintain a boolean flag saying whether the current key-value came from a disk IO or the
block cache. In the HFile writer for the compaction's output we can maintain a flag that is
set if any of the key-values in the block being written came from a cached block, use that
flag at the end of a block to decide whether to cache-on-write the block, and reset the flag
to false on a block boundary. If such an inclusive approach would still trash the cache, we
could restrict the total number of blocks to be cached per an output HFile, switch to an "and"
logic instead of "or" logic for deciding whether to cache an output file block, or only cache
a certain percentage of output file blocks that contain some of the previously cached data.

> Thanks to Nicolas for this elegant online algorithm idea!

This message was sent by Atlassian JIRA

View raw message