hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
Date Fri, 29 Jan 2016 20:14:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124131#comment-15124131

stack commented on HBASE-15180:

Why not use Cell Codec for CellReadableByteArrayInputStream? CellCodec.Decoded takes an InputStream
and returns Cells via CellScanner implementation.

What is difference between a CellReadable and a CellScanner? You have to do advance and then
current each time which is a little more awkward but it is a common pattern we want to institute
throughout. I suppose it is awkward. That'd be your argument. If so, we have CellOutputStream,
should your CellReadable be a CellInputStream with read methods that return Cells to mirror
the write methods we have in CellOutputStream. Your CellReadableByteArrayInputStream would
become CellByteArrayInputStream and would implement CellInputStream.

I've asked this before I know but do we have to flag when tags and when without? Internally,
when we read, the Cell will know if it has tags or not?

What is the length in the below?

	  Cell readCell(int length, boolean withTags) throws IOException;

Do we have to pass this in each time?

192	   * @param directCellRead
193	   *          Whether to make Cells directly from the cellBlock bytes or need to copy.
Pass false
194	   *          while using from client side.

IPCUtil takes a Configuration? Can we not just read the Configuration on construction rather
than pass this flag per call?

Seems like you want server-side and client-side to act different. Having RPCServer 'know'
about MSLAB don't seem right. It is pollution of rpc with internals on how we do memstore.
Can we have another property for when we should lean on the rpc buffer (we 'know' it safe
when mslab is going on..... perhaps a method that obscures the rationale for when to copy....

Patch looks great otherwise.

> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>                 Key: HBASE-15180
>                 URL: https://issues.apache.org/jira/browse/HBASE-15180
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>         Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
> In KeyValueDecoder#parseCell (Default Codec decoder) we use KeyValueUtil#iscreate to
read cells from the InputStream. Here we 1st create a byte[] of length 4 and read the cell
length and then an array of Cell's length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created on top of
a ByteArrayInputStream on top of this. By default in write path, we have MSLAB usage ON. So
while adding Cells to memstore, we will copy the Cell bytes to MSLAB memory chunks (default
2 MB size) and recreate Cells over that bytes.  So there is no issue if we create Cells over
the RPC read byte[] directly here in Decoder.  No need for 2 byte[] creation and copy for
every Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells directly from
> Same Codec path is used in client side also. There better we can avoid this direct Cell
create and continue to do the copy to smaller byte[]s path.  Plan to introduce some thing
like a CodecContext associated with every Codec instance which can say the server/client context.

This message was sent by Atlassian JIRA

View raw message