hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthick Sankarachary (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3327) For increment workloads, retain memstores in memory after flushing them
Date Tue, 10 May 2011 00:34:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030970#comment-13030970
] 

Karthick Sankarachary commented on HBASE-3327:
----------------------------------------------

Just out of curiosity, is this issue still open? In other words, when we read from a {{HFile}}
right after it has been flushed (or compacted), will that strictly be an in-memory call? If
not, will the following approach address this issue (at the risk of sounding uneducated):

- Define a {{Map<Path, BlockCache>}} in {{StoreFile}} that captures the {{BlockCache}}
objects used by writes, regardless of if it's triggered by a flush or a compaction.
- Lookup the {{BlockCache}} from that map based on the {{StoreFile}}'s {{Path}}, at the time
we create a reader for it, and use that as opposed to an empty {{BlockCache}}.

Correct me if I'm wrong, but when "hbase.rs.cacheblocksonwrite" is true, we seem to be caching
blocks on writes regardless of whether we're flushing or compacting. If that's already the
case, we might as well make those block caches visible in the read path.

> For increment workloads, retain memstores in memory after flushing them
> -----------------------------------------------------------------------
>
>                 Key: HBASE-3327
>                 URL: https://issues.apache.org/jira/browse/HBASE-3327
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>
> This is an improvement based on our observation of what happens in an increment workload.
The working set is typically small and is contained in the memstores. 
> 1. The reason the memstores get flushed is because the number of wal logs limit gets
hit. 
> 2. This in turn triggers compactions, which evicts the block cache. 
> 3. Flushing of memstore and eviction of the block cache causes disk reads for increments
coming in after this because the data is no longer in memory.
> We could solve this elegantly by retaining the memstores AFTER they are flushed into
files. This would mean we can quickly populate the new memstore with the working set of data
from memory itself without having to hit disk. We can throttle the number of such memstores
we retain, or the memory allocated to it. In fact, allocating a percentage of the block cache
to this would give us a huge boost.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message