hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB
Date Tue, 28 Feb 2017 00:48:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886946#comment-15886946

stack commented on HBASE-17338:

Trying to follow-along....

bq. We track dataSize (irrespective of cell data in on heap or off heap area).. This dataSize
been used at Segment level for in memory flush decision, Region level for on disk flushes
and globally to force flush some regions.

Dumb question. dataSize is KV infrastructure + key content + value + trailing tags and sequenceid
if any? i.e. the whole KV? And CellSize is infrastructure only or rather key+infrastructure?

bq. At the 1st 2 levels, it is not doubt that we have to track all the cell data size together.
Now the point Ram says is when we have off heap configured and max off heap global size is
say 12 GB, once the data size globally reaches this level, we will force flush some regions.
So his point is for this tracking, we have to consider only off heap Cells and on heap Cell's
data size should not get accounted in the data size but only in the heapSize. (At global level.
But at region and segment level it has to get applied). 2 reasons why I am not in favor of

I'm listening (smile).

bq. 1. This makes the impl so complex. We need to add isOffheap check down the layers. Also
at 2 layers we have to consider these on heap cell data size and one level not.

You can probably guess what I think on the above.


bq. So lets consider the cell data size globally also (how we do now) and make global flushes.

There is one global threshold whether data is onheap or offheap (I probably got this wrong?)

bq. We should be able to turn MSLAB usage ON/OFF per table also. Now this is possible? Am
not sure. 

We could probably but the direct memory would remain allocated until we restart.

Thanks. Lets figure it and update your https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit#heading=h.x14v1a3zw2q9

> Treat Cell data size under global memstore heap size only when that Cell can not be copied
> ---------------------------------------------------------------------------------------------------
>                 Key: HBASE-17338
>                 URL: https://issues.apache.org/jira/browse/HBASE-17338
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>    Affects Versions: 2.0.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>         Attachments: HBASE-17338.patch, HBASE-17338_V2.patch, HBASE-17338_V2.patch, HBASE-17338_V4.patch,
> We have only data size and heap overhead being tracked globally.  Off heap memstore works
with off heap backed MSLAB pool.  But a cell, when added to memstore, not always getting copied
to MSLAB.  Append/Increment ops doing an upsert, dont use MSLAB.  Also based on the Cell size,
we sometimes avoid MSLAB copy.  But now we track these cell data size also under the global
memstore data size which indicated off heap size in case of off heap memstore.  For global
checks for flushes (against lower/upper watermark levels), we check this size against max
off heap memstore size.  We do check heap overhead against global heap memstore size (Defaults
to 40% of xmx)  But for such cells the data size also should be accounted under the heap overhead.

This message was sent by Atlassian JIRA

View raw message