hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Bortnikov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
Date Sun, 21 Aug 2016 13:39:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429724#comment-15429724
] 

Edward Bortnikov edited comment on HBASE-16417 at 8/21/16 1:39 PM:
-------------------------------------------------------------------

Suggestion for Flush Policy, feel free to comment (smile). 

A new configuration parameter, IN_MEMORY_FLUSH_POLICY, will encompass three levels of managing
memory flush at the store (CF) level. 

1. “none”. Semantics: no in-memory flush - status quo before the project started. 

2. “compact_index” (default). Semantics: 
     a. When a MemStore overflows, it is transformed into an immutable segment. Namely, its
index is flattened into a sorted array. 
     b. The new segment is pushed into the segment pipeline (list of immutable segments, sorted
by creation time). The pipeline segments are used for serving reads, along with the new MemStore
and the block cache. 
     c. A MemStore (disk) flush writes the oldest in-memory segment to a file. 
     d. When too many segments accumulate in the pipeline (e.g., above 3), their indices are
merged to reduce the number of files created by disk flushes. The threshold is not available
for end-user tuning. Implementation details: 
         - No copy happens below the index level - neither the Cell objects nor the binary
data are relocated. 
         - No redundant cells are eliminated, to avoid the costly SQM scan. 

3. “compact_data”. This mode is targeted to use cases with high churn/locality of writes.
Semantics (difference from 2d): 
     a. When too many segments accumulate in the pipeline, their indices and data are merged,
to reduce the memory footprint and postpone the future I/O. 
         - Redundant cells are eliminated (SQM scan is applied). 
         - If MSLAB storage is used for binary data, then the data in the new segment created
by merge is relocated to new chunks. 



was (Author: ebortnik):
Suggestion for Flush Policy, feel free to comment (smile). 

A new configuration parameter, IN_MEMORY_FLUSH_POLICY, will encompass three levels of managing
memory flush at the store (CF) level. 

1. “none”. Semantics: no in-memory flush - status quo before the project started. 
2. “compact_index” (default). Semantics: 
     a. When a MemStore overflows, it is transformed into an immutable segment. Namely, its
index is flattened into a sorted array. 
     b. The new segment is pushed into the segment pipeline (list of immutable segments, sorted
by creation time). The pipeline segments are used for serving reads, along with the new MemStore
and the block cache. 
     c. A MemStore (disk) flush writes the oldest in-memory segment to a file. 
     d. When too many segments accumulate in the pipeline (e.g., above 3), their indices are
merged to reduce the number of files created by disk flushes. The threshold is not available
for end-user tuning. Implementation details: 
         - No copy happens below the index level - neither the Cell objects nor the binary
data are relocated. 
         - No redundant cells are eliminated, to avoid the costly SQM scan. 

3. “compact_data”. This mode is targeted to use cases with high churn/locality of writes.
Semantics (difference from 2d): 
     a. When too many segments accumulate in the pipeline, their indices and data are merged,
to reduce the memory footprint and postpone the future I/O. 
         - Redundant cells are eliminated (SQM scan is applied). 
         - If MSLAB storage is used for binary data, then the data in the new segment created
by merge is relocated to new chunks. 


> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message