hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14921) Memory optimizations
Date Tue, 29 Mar 2016 13:29:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215997#comment-15215997

Anastasia Braginsky commented on HBASE-14921:

bq. Please say more on this. CellSet is a NavigableMap (not a ConcurrentNavigableMap) so I'm
missing where we need the 'Concurrent' (is it in this patch?)

Indeed CellSet is NavigableMap. However, CellSet has field “delegatee”, whose type is
ConcurrentNavigableMap. We want CellSet to have different types of delegatees, but they need
to be ConcurrentNavigableMaps. Hereby, I copy-paste the code: 


public class CellSet implements NavigableSet<Cell>  {
  // Implemented on top of a {@link java.util.concurrent.ConcurrentSkipListMap}
  // Differ from CSLS in one respect, where CSLS does "Adds the specified element to this
set if it
  // is not already present.", this implementation "Adds the specified element to this set
  // if it is already present overwriting what was there previous".
  // Otherwise, has same attributes as ConcurrentSkipListSet
  private final ConcurrentNavigableMap<Cell, Cell> delegatee;

  CellSet(final ConcurrentNavigableMap<Cell, Cell> m) { this.delegatee = m;}

bq. Your new names are better. I considered 'flat' Map but shied away given its meaning over
in spark/scala; I think it will be ok as long as you stick why its a 'flat' map in the javadoc
on CellFlatMap.

I’ll change the names and add explanations

bq. How do you see this working? We do not control the size of inbound Cells. They could have
some regularity and they could also be erratic to the extreme (What to do when a 1G cell arrives
into a column family that up to this has been taking on metrics?)

Excellent comment! Indeed we have a problem with Cells bigger then Chunks. So we have no choice,
but to introduce the special variable-size very-large Chunks to support the very-large Cells.
We’ll improve the code after the basic benchmarking.

bq. I still do not see how the 3 * int is BYTES_IN_CELL. Not important.

I think the problem here (and also in some other questions) is the name “Cell”. Because
CellFlatMap doesn’t work with “Cell data” or with "true Cells” as you are (correctly)
using this word. CellFlatMap works with some "cell representation”, using those 3 integers
you can get all other Cell information, what is the “true Cell”. Should I change this


It was introduced and off by default as is usual when new features. But as also happens this
is our practice, the facility was 'forgotten'. It came up then when our Lars noticed it and
wanted to remove it since it was not being used. It came up again recently in HBASE-15513

It would seem to make sense enabling it by default if we come up w/ a proper sizing. Having
it on seems to mess w/ G1GC too. Would need to figure that.

I took a look on HBASE-15513, it is very interesting. It looks like it favors turning ChunkPool
on by default.It also looks very reasonable to me. I also took a very brief look on HBASE-15180.
Specifically on the statement:
bq. I noticed about 5-10% improvement on GC times and CPU utilization after disabling MSLAB
only if using G1GC. Tuning MSLAB helps a little but I don't see to much advantage to have
it enabled when G1GC is there.
However, I do not see enough evidence in those measurement. How many workloads were tested?
What where the sizes of Cells? Need to read this Jira more carefully.

bq. We need to do up a memory management doc. Between your work on Segments, Segment pipelines,
MSLAB chunks, chunk pools and bytebufferpools to host requests read from sockets, bucket cache
and reference counting bucketcache bucket blocks at read time, it would be good if we had
a map so we could trace a Cell on its travels.

 I’ll do the document little later on.

> Memory optimizations
> --------------------
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, CellBlocksSegmentinthecontextofMemStore(1).pdf,
HBASE-14921-V01.patch, HBASE-14921-V02.patch
> Memory optimizations including compressed format representation and offheap allocations

This message was sent by Atlassian JIRA

View raw message