hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Hu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14535) The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read
Date Tue, 04 Jun 2019 03:12:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855258#comment-16855258

Zheng Hu commented on HDFS-14535:

bq. if you're missing the "open file" often enough for this to make a difference, is the short-circuit
feature actually helpful for the workload? Maybe you would be better off disabling it entirely.
Emm...IIRC, seems our HBase code have a bug here,  for Get operation,  all of the query should
share the the same reader, which means shouldn't request the short-circuit fd so frequently.
 Any way I can check the HBase code again, and the HDFS PR can still be merged ino trunk.
bq. I also noticed in the flame graph a couple of other suspicious items like Pattern.compile
in DomainSocket.getEffectivePath(). That's not a heavy memory allocator but certainly seems
like an easy thing to optimize out for some CPU win (in another patch).
That's true, will PR for this if have any time. 
bq. Mind sending this as a PR for the apache/hadoop repo on github?
I've created the attached PR, you can see that.

> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots
of heap allocation in HBase when using short-circut read
> ----------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-14535
>                 URL: https://issues.apache.org/jira/browse/HDFS-14535
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: HDFS-14535.patch
> Our HBase team are trying to read the blocks from HDFS into pooled offheap ByteBuffers
directly (HBASE-21879),  and recently we had some benchmark, found that almost 45% heap allocation
from the DFS client.   The heap allocation flame graph can be see here: https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path,  we found that when requesting file descriptors from a
DomainPeer,  we allocated huge 8KB buffer for BufferedOutputStream, though the protocal content
was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%,  which increased the
HBase P999 latency.  Actually,  we can pre-allocate a small buffer for the BufferedOutputStream,
such as 512 bytes, it's enough to read the short-circuit fd protocal content.  we've created
a patch like that, and the allocation flame graph show that  after the patch, the heap allocation
from DFS client dropped from 45% to 27%, that's a very good thing  I think.  see: https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,  HBase will
benifit a lot from this. 
> Thanks. 
> For more details, can see here: https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message