hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
Date Sat, 30 Jan 2016 01:28:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124586#comment-15124586
] 

Anoop Sam John commented on HBASE-15180:
----------------------------------------

CellScanner - Ya we made it like normal our scanner way of call advance and then get current.
 You mean we should have been doing just getting next Cell from the Decoder?  
bq. If so, we have CellOutputStream, should your CellReadable be a CellInputStream with read
methods that return Cells to mirror the write methods we have in CellOutputStream. Your CellReadableByteArrayInputStream
would become CellByteArrayInputStream and would implement CellInputStream.
SO u suggest renaming of the interface.  That should be fine and looks better.

bq.I've asked this before I know but do we have to flag when tags and when without? Internally,
when we read, the Cell will know if it has tags or not?
To avoid the overhead of parsing tagsLength every time this was done.  The old method we used
to read cells from InputStream was
iscreate(final InputStream in, boolean withTags)
So KVCodecWithTags only pass this as true and we make KeyValue instance.  By default we have
KVCodec which pass false and we make NoTagsKV on which getTagsLength is just return 0.  But
there is still an issue more.  When we copy the Cell to MSLAB before adding to Memstore, we
do copy and make a new KeyValue only. So this NoTagsKV becomes KV there !!!    Need handle
but another issue.
Ya as u said, we can avoid passing this boolean and after the Cell is been read out in new
CellInputStream (KeyValue object then), we need to parse tags length once and see it is 0
or >0 and based on that recreate NoTagsKV if needed.  That is one parsing op extra and
one Object creation extra. For that gone with passing the boolean at that time I believe.
 What do u say?

{quote}
What is the length in the below?
Cell readCell(int length, boolean withTags) throws IOException;
Do we have to pass this in each time?
{quote}
This was needed because of the way we have this PushbackIS.  In Decoder impl, we wrap the
actual IS with this PBIS.  The Decoder advance operation, reads one byte to know whether it
is end.  And now the remaining 3 bytes and old already read byte to be considered to get the
Cell's total length.  Said that the read of the Cell's length has to happen in the PBIS impl.
 (there only we know the old already read single byte)..  And the read of the Cell (Construction
of Cell) to be done in CellByteArrayInputStream.  That is why we have to pass the length.
I agree it looks ugly... I had no other way..  This PBIS thing was done to read Cells in WAL
decoder properly.   In case of reading Cells from byte[] from RPC, we dont really have such
an issue.   Let me see some way we can solve this. May be we need special treatment for both
cases.

bq.IPCUtil takes a Configuration? Can we not just read the Configuration on construction rather
than pass this flag per call?
IPCUtil #createCellScanner(final Codec codec, final CompressionCodec compressor,  final byte
[] cellBlock)  -  Used from RpcClient
new method called from RpcServer alone.  Yes I dont want this direct Cell create to happen
in client side where we dont have an MSLAB copy.   Still as u said, if we read the conf property
and set in IPCUtil and use that while construction of Stream, I can avoid this passing of
boolean.   If any chance the config xml in server is used in the client and that says use
MSLAB (Still we dont have as we dont have Memstore and all there), we will do it wrongly no.
  That is I was afraid to do that.  

Now any way you suggest add a new config to decide this copy or not rather than rely on MSLAB.
I think ya it is good. Only downside is again a new config!  But ya it looks better..  Let
me do that.. 

> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>
>                 Key: HBASE-15180
>                 URL: https://issues.apache.org/jira/browse/HBASE-15180
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
>
>
> In KeyValueDecoder#parseCell (Default Codec decoder) we use KeyValueUtil#iscreate to
read cells from the InputStream. Here we 1st create a byte[] of length 4 and read the cell
length and then an array of Cell's length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created on top of
a ByteArrayInputStream on top of this. By default in write path, we have MSLAB usage ON. So
while adding Cells to memstore, we will copy the Cell bytes to MSLAB memory chunks (default
2 MB size) and recreate Cells over that bytes.  So there is no issue if we create Cells over
the RPC read byte[] directly here in Decoder.  No need for 2 byte[] creation and copy for
every Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells directly from
it.  
> Same Codec path is used in client side also. There better we can avoid this direct Cell
create and continue to do the copy to smaller byte[]s path.  Plan to introduce some thing
like a CodecContext associated with every Codec instance which can say the server/client context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message