hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
Date Fri, 29 Jan 2016 03:28:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122878#comment-15122878
] 

Anoop Sam John edited comment on HBASE-15180 at 1/29/16 3:28 AM:
-----------------------------------------------------------------

You mean to make Codec decoder to work over a byte[] also rather than on InputStream? That
is nice. Why I did not go through that path is my next work is in E2E off heap in write path
also.  So when the write req comes, we might be reading it into an off heap BB (from a pool).
So then byte[] based decoding is not possible.  May be BB based then.   Again am trying to
experiment with reading the req not just into one single large BB.   From the pool we might
be getting fixed sized smaller BBs (Say 64 KB or so)  And we can read in to those many BBs.
 And the CellScanner need to work on a set of BBs  (Like the MultiByteBuff stuff)    Then
again even this BB based API is an issue..  Continue  with an InputStream based API gives
us the freedom of experimenting with this different data structures.

bq.Should we default to MSLAB for good? I don't think anybody runs with MSLAB off.
Ya we have MSALB enabled by default.  I agree that doing the MSLAB check in RPC layer looks
ugly.   Wanted to avoid we refer to the req read byte[] (from memstore cells) when some one
turns MSLAB off.    So what do you say? Remove this check? 

bq.Can the byte[4]'s be statically allocated?
you mean this?
{code}
 // Buffer used to read an int from the stream
+  private byte[] intBuf = null;
...
if (intBuf == null) {
+      // Lazy init. In real flow, we will use the readCell(int, boolean) API only
+      intBuf = new byte[Bytes.SIZEOF_INT];
+    }
{code}
byte[4] has to be instance level var


was (Author: anoop.hbase):
You mean to make Codec decoder to work over a byte[] also rather than on InputStream? That
is nice. Why I did not go through that path is my next work is in E2E off heap in write path
also.  So when the write req comes, we might be reading it into an off heap BB (from a pool).
So then byte[] based decoding is not possible.  May be BB based then.   Again am trying to
experiment with reading the req not just into one single large BB.   From the pool we might
be getting fixed sized smaller BBs (Say 64 KB or so)  And we can read in to those many BBs.
 And the CellScanner need to work on a set of BBs  (Like the MultiByteBuff stuff)    Then
again even this BB based API is an issue..  Continue  with an InputStream based API gives
us the freedom of experimenting with this different data structures.

bq.Should we default to MSLAB for good? I don't think anybody runs with MSLAB off.
Ya we have MSALB enabled by default.  I agree that doing the MSLAB check in RPC layer looks
ugly.   Wanted to avoid we refer to the req read byte[] (from memstore cells) when some one
turns MSLAB off.    So what do you say? Remove this check? 

bq.Can the byte[4]'s be statically allocated?
No we can not. There will be many parallel req processing and many instances of this InputStream
in action.  On one instance only we dont have multi threaded reads.  Making it static make
all IS instances refer the same byte [] !

> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>
>                 Key: HBASE-15180
>                 URL: https://issues.apache.org/jira/browse/HBASE-15180
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15180.patch
>
>
> In KeyValueDecoder#parseCell (Default Codec decoder) we use KeyValueUtil#iscreate to
read cells from the InputStream. Here we 1st create a byte[] of length 4 and read the cell
length and then an array of Cell's length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created on top of
a ByteArrayInputStream on top of this. By default in write path, we have MSLAB usage ON. So
while adding Cells to memstore, we will copy the Cell bytes to MSLAB memory chunks (default
2 MB size) and recreate Cells over that bytes.  So there is no issue if we create Cells over
the RPC read byte[] directly here in Decoder.  No need for 2 byte[] creation and copy for
every Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells directly from
it.  
> Same Codec path is used in client side also. There better we can avoid this direct Cell
create and continue to do the copy to smaller byte[]s path.  Plan to introduce some thing
like a CodecContext associated with every Codec instance which can say the server/client context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message