hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
Date Sat, 30 Jan 2016 14:16:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124905#comment-15124905

Anoop Sam John commented on HBASE-15180:

Codec and its Decoder being non private, we can not directly change the params/return types.And
same Decoder used in WAL reading as well. You can note that the getDecoder still takes an
InputStream.  Now we are returning a BAIS object from this IpcUtil method. So for reading
a Cell, we need copy cell bytes into new byte[].  Here this actual Object type which we return
only is getting Changed..  We extend BAIS and make a CellBAIS which implements CellInputStream.
And while we read cells from Decoder, we have check like if it is CellInputStream, we read
Cells directly from it.   While reading from WAL, same path of flow is executed and then the
IS wont be CellInputStream type.  Then we will be doing copy from Stream and make Cells..
So Codec gives a Decoder (which is of type CellScanner)  by taking an InputStream.  There
is no change am making in this area.  Only diff is instead of BAIS we make CellBAIS object
so we save copy.

I thought of making the Context so as to know the codec context where it is running (client
or server)..   We use diff IpcUtil methods (extra boolean param for copy or not) and that
is why I thought now Context is not needed.  As u suggest we can have new config which can
be turned on at server side and off at client side/

bq.lets fix that rather than pass extra flag all over.
Let me try.

> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>                 Key: HBASE-15180
>                 URL: https://issues.apache.org/jira/browse/HBASE-15180
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>         Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
> In KeyValueDecoder#parseCell (Default Codec decoder) we use KeyValueUtil#iscreate to
read cells from the InputStream. Here we 1st create a byte[] of length 4 and read the cell
length and then an array of Cell's length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created on top of
a ByteArrayInputStream on top of this. By default in write path, we have MSLAB usage ON. So
while adding Cells to memstore, we will copy the Cell bytes to MSLAB memory chunks (default
2 MB size) and recreate Cells over that bytes.  So there is no issue if we create Cells over
the RPC read byte[] directly here in Decoder.  No need for 2 byte[] creation and copy for
every Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells directly from
> Same Codec path is used in client side also. There better we can avoid this direct Cell
create and continue to do the copy to smaller byte[]s path.  Plan to introduce some thing
like a CodecContext associated with every Codec instance which can say the server/client context.

This message was sent by Atlassian JIRA

View raw message