hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14921) Memory optimizations
Date Thu, 21 Jul 2016 10:07:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387470#comment-15387470

Anastasia Braginsky commented on HBASE-14921:

Thank you [~anoop.hbase] for your very reasonable comments!

bq. But when the use case is like some thing of time series data, where we really dont expect
duplicates/updates, it might be better to turn off compaction and do only flatten.

Do you suggest to make an externally editable flag for turning compaction on and off? So what
should be the default value for this flag? Didn’t we wanted sysadmins to work less with
all those flags and settings (that we already have)? We can make this compaction-pre-check
scan every second (Xth) flush to pipeline if it appears to decrease the performance.

bq. Again flatten to CellChunkMap would be ideal as that will release and reduce heap memory
footprint for this memstore considerably. CellArrayMap, yes it reduces but not much. 

CellChunkMap is valuable because it can be taken off-heap, but CellChunkMap doesn’t significantly
reduces the memory usage compared to CellArrayMap. All that you save memory-wise in CellChunkMap
is that Cell object is now “embedded" as part of the array, and so you do not need the reference
and the object overhead. So the difference between CellArrayMap and CellChunkMap is in 24
bytes per Cell.

bq. In your usecase, the max adv you get because of the compaction as many cells will get

I do not agree. In our experiments we (on purpose) use uniform distribution with small data
size and we have little duplicates. We still see that the compaction has little impact on
the performance.

bq. My another concern is regarding the fact that in this memstore only the tail of the pipeline
getting flushed to disk when a flush request comes. In 1st version it was like always the
compaction happens. So all chances that the tail of pipeline is much bigger sized and so that
much data gets flushed. Now when compaction is not at all happening and we do have many small
sized segments in pipeline, it would have been better to flush all the segments to disk that
making small sized flushes. I raised this concern at first step also. But then the counter
was that the compaction happens always but now it is not the case.

I remember this concern of yours from the code review. This is a valid concern and we are
thinking about it. Apparently, this is one more reason to do compactions (at least for merge)
once in a while. We can do it when we have like e.g. 10 segments in the pipeline. If we are
going to simply flush it all to disk we are going to create many small files and their compaction
is going to run on disk then...

bq. JFYI.. There is a periodic memstore flush checking. If we accumulate more than 30 million
edits in memstore, we will flush

We know there is a flush to disk once about every hour. The main reason for that is WAL, right?
Otherwise, why would we care how many cells are in memory? Actually, may be in this we do
not want to flush absolutely everything to disk and to flush just the oldest part so the WAL
can truncate a bit is enough?

> Memory optimizations
> --------------------
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, CellBlocksSegmentinthecontextofMemStore(1).pdf,
HBASE-14921-V01.patch, HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch,
HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch, HBASE-14921-V06-CAO.patch, InitialCellArrayMapEvaluation.pdf,
> Memory optimizations including compressed format representation and offheap allocations

This message was sent by Atlassian JIRA

View raw message