hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
Date Wed, 04 Nov 2015 19:31:28 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990234#comment-14990234
] 

Eshcar Hillel commented on HBASE-13408:
---------------------------------------

Great comments and questions [~stack].
We will work on improving the document and code along the lines you suggested and the code
review. Meanwhile here are some answers and clarifications:

bq. The part that will be flushed is the 'compacted' part?

Yes. And specifically, it would be the tail of the compaction pipeline which is comprised
of segments list.

bq. On name of the config., I think it should be IN_MEMORY_COMPACTION rather than COMPACTED

We’ll change the name, however we feel it is better to have it off by default, at least
until users/applications are fully aware of the implications of this feature.

bq. Can the in-memory flush use same code as the flush-to-disk flush? Ditto on compaction?

Flush - no, compaction - yes.
In memory flush makes changes to in memory data structures, while disk flush writes to disk.
When compacted memstore fully supports HFile format, can share the same compaction code.

bq. what is the above (flush­total­size) for?
bq. can you be more clear on where the threshold for flush to disk is?

Currently flush is called when memstore size reaches 128MB, however region can tolerate even
larger memstore size before blocking the update operation. So there is lower bound for triggering
a flush and an upper bound for triggering a flush while blocking update operations.
With flush-total-size we attempt to further refine these boundaries, and have a soft lower
bound instead of a hard bound.
In the new solution region can tolerate memstore size larger than 128MB (but smaller than
flush-total-size) before calling a flush to disk, knowing that the size is not necessarily
monotonically increasing between flushes. We distinguish between the data that is in active
segments (which are still bounded by 128MB) and overflow segments being compacted. The size
of all data in memstore is bounded by flush-total-size, where flush-size < flush-total-size
< flush-blocking size.

bq. What is a snapshot in this scheme? we have to do a merge sort on flush to make the hfile?

The snapshot is a single immutable segment that is *not* subject to compaction. There is no
need to do a merge sort on flush to disk.

bq. Do we hold the region lock while we compact the in-memory segments on a column family?
Every time a compaction runs, it compacts all segments in the pipeline?

No - the lock is held only while making the changes to the in-memory data structures: removing
the tail segment from the compaction pipeline and crossing it to snapshot.
Yes - currently a compacion compacts all segments in the pipeline.

bq. I'm not sure I follow the approximation of oldest sequence id.

This was explained in posts between july 23-july 30. Can explain this again if required. 


bq. Do you have a rig where you can try out your implementation apart from running it inside
a regionserver?

What do you mean by rig? If you mean benchmark environment then no. If you mean testing then
these are included in the patch.

bq. we talking about adding one more thread – a compacting thread – per Store?

In the new design, the threads are run by the region server executor.

bq. On MemstoreScanner, we are keeping the fact that the implementation is crossing Segments
an internal implementation detail?

Yes.

bq. I suppose you'll deliver a skiplist version first and then move on to work on in-memory
storefile, a more compact in-memory representation?

This is a task that should definitely be completed; HBASE-10713 is a good starting point.

bq. Seems like the whole notion of snapshot should not be exposed to the client. It is an
implementation detail of the original memstore, the defaultmemstore, something that we should
try not expose.

Agree, however seems out of the scope of the current Jira which focuses on in-memory compaction.


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch,
HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch,
HBASE-13408-trunk-v08.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf,
HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf,
InMemoryMemstoreCompactionMasterEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf,
StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its in-memory component.
The memstore absorbs all updates to the store; from time to time these updates are flushed
to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted
until it is written to the filesystem and optionally to block-cache. This may result in underutilization
of the memory due to duplicate entries per row, for example, when hot data is continuously
updated. 
> Generally, the faster the data is accumulated in memory, more flushes are triggered,
the data sinks to disk more frequently, slowing down retrieval of data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data in memory,
and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.	The data is kept in memory for as long as possible
> 2.	Memstore data is either compacted or in process of being compacted 
> 3.	Allow a panic mode, which may interrupt an in-progress compaction and force a flush
of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message