hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14921) Memory optimizations
Date Sat, 26 Mar 2016 06:26:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212838#comment-15212838

stack commented on HBASE-14921:

Reviewed again after another read of the attached diagram.

Does the 'CellBlock' Map have to be a Concurrent Map? This is a readonly datastructure once
made? No need of the 'Concurrent'?

If so, you made CellSet (or Eshcar) made CellSet which implements NaviableSet. Here we are
slinging Cells and implement NavigableMap. Call it CellMap rather than CellBock? I was trying
to come up with a name that made reference to how it is actually implemented -- e.g. like
ConcurrentSkipListMap -- which would seem to say all it CellArrayMap rather than CellBlock
but when I see CellBlockObjectArray later..  may be CellBinSearchMap....

It is odd that comparator is passed on construction and when we call createCellBlocks. Should
constructor be protected so folks can't construct one w/o there being data in it? Should we
save the comparator when createCellBlocks rather than pass on construction?  Then we have
to make sure the comparators agree? Or do we? Seems like you need it on construction so can
propagate instances with the createCellBlocks call. Seems like allowing passing different
comparators would make for trouble so don't allow passing comparator on createCellBlocks?

Why name the method getCellFromIndex(int i)? Should it be get(i) because we are getting the
ith Cell in Map? (The comment says that this is what it is doing)

It'd be cool to compare a CSLM to one of these CAMs at different scales to see difference
in perf in a test bench.

For CellBlockObjectArray, it should be called CellArrayMap?

What does "CellBlockObjectArray is a simple array of Cells allocated using JVM." mean? All
our stuff is on a JVM (smile)

Below should be final

  Cell[] block;

Is the below total Cels?

  private int numOfCellsInsideChunk;

It looks like it is Cells per chunk? This is constant for all Chunks in a CellBlockSerialized?

(BTW CellBlockSerialized could be ChunkCellMap or CellMLABMap

This seems to be the 'key', or 'index key' size, not bytes in a cell?: private static final
int BYTES_IN_CELL = 3*(Integer.SIZE / Byte.SIZE); // each Cell requires 3 integers

I love the way this works. Very nice. Simple. Only bit is how you ensure all Chunks have same
amount of Cells?

Remove this stuff rather than just comment it out:

     //c = (chunkPool != null) ? chunkPool.getChunk() : new Chunk(chunkSize, 5); //14921

The 14921 everywhere should be stripped. Its HBASE-14921? 

Is this a hardcoding?

201	        c = new Chunk(chunkSize, 5);

The Chunk Pool needs to be on for this all to work? Is that in this patch?


   * Only used in testing

we usually mark with @VisibleForTestiing annotation.

Patch looks great.

> Memory optimizations
> --------------------
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, CellBlocksSegmentinthecontextofMemStore(1).pdf,
HBASE-14921-V01.patch, HBASE-14921-V02.patch
> Memory optimizations including compressed format representation and offheap allocations

This message was sent by Atlassian JIRA

View raw message