hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
Date Thu, 24 Jul 2014 17:26:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073404#comment-14073404

Daryn Sharp commented on HDFS-6709:

Questions/comments on the advantages:
* I thought RTTI is per class, not instance?  If yes, the savings are immaterial?
* Using misaligned access may result in processor incompatibility, impact performance, introduces
atomicity and CAS problems, concurrent access to adjacent misaligned memory in the cache line
may be completely unsafe.
* No references, only primitives can be stored off-heap, so how do value types (non-boxed
primitives, correct?) apply?  Wouldn't the instance managing the slab have methods that return
the correct primitive?

I think off-heap may be a win in some limited cases, but I'm struggling with how it will work
in practice.  Here's thoughts for clarification on actual application of the technique:
# OO encapsulation and polymorphism are lost?
# We can't store references anymore so we're reduced to primitives?
# Let's say we used to have a class {{Foo}} with instance fields {{field1..field4}} of various
types.  {{FooManager.get(id)}} returns a {{Foo}} instance.  But now a off-heap structure doesn't
have any instantiated {{Foo}} entries else there is no GC benefit other than smaller instances
to compact.
# Does {{FooManager}} instantiate new {{Foo}} instances every time {{FooManager.get(id)}}
is called?  If yes, it generates a tremendous amount of garbage that defeats the GC benefit
of going off heap.
# Does {{FooManager}} try to maintain a limited pool of mutable {{Foo}} objects for reuse
(ex. via a {{Foo#reinitialize(id, f1..f4)}}?  (I've tried this technique elsewhere with degraded
performance but maybe there's a good way to do)
# If no {{Foo}} entries are allowed:
## does {{FooManager}} have methods for every data member that used to be encapsulated by
{{Foo}}?  Ie. {{FooManager.getField$N(id)}}?  We'll have to make N-many calls probably within
a critical section?
## Will apis change from {{doSomething(Foo foo, String msg, boolean flag)}} to {{doSomething(Long
fooId, int fooField1, long fooField2, boolean fooField3, long fooField4, String msg, boolean
## If we add another field, do we go back and update all the apis again?

> Implement off-heap data structures for NameNode and other HDFS memory optimization
> ----------------------------------------------------------------------------------
>                 Key: HDFS-6709
>                 URL: https://issues.apache.org/jira/browse/HDFS-6709
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-6709.001.patch
> We should investigate implementing off-heap data structures for NameNode and other HDFS
memory optimization.  These data structures could reduce latency by avoiding the long GC times
that occur with large Java heaps.  We could also avoid per-object memory overheads and control
memory layout a little bit better.  This also would allow us to use the JVM's "compressed
oops" optimization even with really large namespaces, if we could get the Java heap below
32 GB for those cases.  This would provide another performance and memory efficiency boost.

This message was sent by Atlassian JIRA

View raw message