hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
Date Thu, 21 Feb 2019 01:45:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773586#comment-16773586
] 

Duo Zhang commented on HBASE-21879:
-----------------------------------

The ByteBuff itself is IA.Private, but we exposes it throw the Codec interface
{code:title=Codec.java}
  Decoder getDecoder(ByteBuff buf);
{code}

And also WALCellCodec, which is also marked as IA.LimitedPrivate, but anyway, I do not find
the place where we return a ByteBuff to user so maybe it is OK to just move the method, as
user will never call it?

Then we can make use of netty's ByteBuf directly, which already have reference counting.

> Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21879
>                 URL: https://issues.apache.org/jira/browse/HBASE-21879
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
>         Attachments: QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
>     long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics)
>  throws IOException {
>  // .....
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with BBPool
(offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>       onDiskSizeWithHeader - preReadHeaderSize, true, offset + preReadHeaderSize, pread);
>   if (headerBuf != null) {
>         // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then copy the
on-heap byte[] to offheap bucket cache asynchronously,  and in my  100% get performance test,
I also observed some frequent young gc,  The largest memory footprint in the young gen should
be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] for reducing
young gc purpose. we did not implement this before, because no ByteBuffer reading interface
in the older HDFS client, but 2.7+ has supported this now,  so we can fix this now. I think.

> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message