hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16747) Track memstore data size and heap overhead separately
Date Thu, 27 Oct 2016 11:08:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611548#comment-15611548

Anoop Sam John commented on HBASE-16747:

Tests looks good now..  [~saint.ack@gmail.com] Fixed all ur comments in RB.  There is a change
in patch from the 1st version what u reviewed. The major change is in test code to make them
pass.  The memstore size related assertions in many tests got changed now.  Also Some tests
need to change the flush size etc as we dont track the heap overhead as part of memstore size
now.  You would like to see the changes for cur patch in RB vs the 1st one?
[~yuzhihong@gmail.com] Ur Q on RB I have replied. Pls let me know comment is still valid.
I would like to get a +1 here so that can continue with other subtasks. 

> Track memstore data size and heap overhead separately 
> ------------------------------------------------------
>                 Key: HBASE-16747
>                 URL: https://issues.apache.org/jira/browse/HBASE-16747
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>         Attachments: HBASE-16747.patch, HBASE-16747.patch, HBASE-16747_V2.patch, HBASE-16747_V2.patch,
HBASE-16747_V3.patch, HBASE-16747_V3.patch, HBASE-16747_V3.patch, HBASE-16747_V4.patch, HBASE-16747_WIP.patch
> We track the memstore size in 3 places.
> 1. Global at RS level in RegionServerAccounting. This tracks all memstore's size and
used to calculate whether forced flushes needed because of global heap pressure
> 2. At region level in HRegion. This is sum of sizes of all memstores within this region.
This is used to decide whether region reaches flush size (128 MB)
> 3. Segment level. This tracks the in memory flush/compaction decisions.
> All these use the Cell's heap size which include the data bytes# as well as Cell object
heap overhead.  Also we include the overhead because of addition of Cells into Segment's data
structures (Like CSLM).
> Once we have off heap memstore, we will keep the cell data bytes in off heap area. So
we can not track both data size and heap overhead as one entity. We need to separate them
and track.
> Proposal here is to track both cell data size and heap overhead separately at global
accounting layer.  As of now we have only on heap memstore. So the global memstore boundary
checks will consider both (adds up and check against global max memstore size)
> Track cell data size alone (This can be on heap or off heap) in region level.  Region
flushes use cell data size alone for the region flush decision. A user configuring 128 MB
as flush size, normally he will expect to get a 128MB data flush size. But as we were including
the heap overhead also, once the flush happens, the actual data size getting flushed is way
behind this 128 MB.  Now with this change we will behave more like what a user thinks.
> Segment level in memory flush/compaction also considers cell data size alone.  But we
will need to track the heap overhead also. (Once the in memory flush or normal flush happens,
we will have to adjust both cell data size and heap overhead)

This message was sent by Atlassian JIRA

View raw message