hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
Date Wed, 23 Jul 2014 17:01:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071964#comment-14071964

Daryn Sharp commented on HDFS-6709:

If {{Unsafe}} is being removed then I don't think we should create a dependency on it.  Sadly,
while investigating off heap performance last fall, I found this article that claims off-heap
reads via a {{DirectByteBuffer}} have *horrible* performance:


bq. With a hash table and a linked list, we could probably start off-heaping things such as
the triplets array in the BlockInfo object.

How you do envision off-heaping triplets in conjunction with those collections?  Linked list
entries cost 48 bytes on a 64-bit jvm.  A hash table entry costs 52 bytes.  I know your goal
is reduced GC while ours is reduced memory usage, so it'll be unacceptable if an off-heap
implementation consumes even more memory - which incidentally will require GC and may cancel
any off-heap benefit?  And/or cause a performance degradation.

> Implement off-heap data structures for NameNode and other HDFS memory optimization
> ----------------------------------------------------------------------------------
>                 Key: HDFS-6709
>                 URL: https://issues.apache.org/jira/browse/HDFS-6709
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-6709.001.patch
> We should investigate implementing off-heap data structures for NameNode and other HDFS
memory optimization.  These data structures could reduce latency by avoiding the long GC times
that occur with large Java heaps.  We could also avoid per-object memory overheads and control
memory layout a little bit better.  This also would allow us to use the JVM's "compressed
oops" optimization even with really large namespaces, if we could get the Java heap below
32 GB for those cases.  This would provide another performance and memory efficiency boost.

This message was sent by Atlassian JIRA

View raw message