Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 24 Jul 2014 17:26:39 +0000 (UTC)
From: "Daryn Sharp (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12728339.1405727679556.38667.1406222799976@arcas>
In-Reply-To: <JIRA.12728339.1405727679556@arcas>
References: <JIRA.12728339.1405727679556@arcas>
Subject: [jira] [Commented] (HDFS-6709) Implement off-heap data structures
 for NameNode and other HDFS memory optimization
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073404#comment-14073404 ] 

Daryn Sharp commented on HDFS-6709:
-----------------------------------

Questions/comments on the advantages:
* I thought RTTI is per class, not instance?  If yes, the savings are immaterial?
* Using misaligned access may result in processor incompatibility, impact performance, introduces atomicity and CAS problems, concurrent access to adjacent misaligned memory in the cache line may be completely unsafe.
* No references, only primitives can be stored off-heap, so how do value types (non-boxed primitives, correct?) apply?  Wouldn't the instance managing the slab have methods that return the correct primitive?

I think off-heap may be a win in some limited cases, but I'm struggling with how it will work in practice.  Here's thoughts for clarification on actual application of the technique:
# OO encapsulation and polymorphism are lost?
# We can't store references anymore so we're reduced to primitives?
# Let's say we used to have a class {{Foo}} with instance fields {{field1..field4}} of various types.  {{FooManager.get(id)}} returns a {{Foo}} instance.  But now a off-heap structure doesn't have any instantiated {{Foo}} entries else there is no GC benefit other than smaller instances to compact.
# Does {{FooManager}} instantiate new {{Foo}} instances every time {{FooManager.get(id)}} is called?  If yes, it generates a tremendous amount of garbage that defeats the GC benefit of going off heap.
# Does {{FooManager}} try to maintain a limited pool of mutable {{Foo}} objects for reuse (ex. via a {{Foo#reinitialize(id, f1..f4)}}?  (I've tried this technique elsewhere with degraded performance but maybe there's a good way to do)
# If no {{Foo}} entries are allowed:
## does {{FooManager}} have methods for every data member that used to be encapsulated by {{Foo}}?  Ie. {{FooManager.getField$N(id)}}?  We'll have to make N-many calls probably within a critical section?
## Will apis change from {{doSomething(Foo foo, String msg, boolean flag)}} to {{doSomething(Long fooId, int fooField1, long fooField2, boolean fooField3, long fooField4, String msg, boolean flag)}}?
## If we add another field, do we go back and update all the apis again?


> Implement off-heap data structures for NameNode and other HDFS memory optimization
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6709
>                 URL: https://issues.apache.org/jira/browse/HDFS-6709
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-6709.001.patch
>
>
> We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization.  These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps.  We could also avoid per-object memory overheads and control memory layout a little bit better.  This also would allow us to use the JVM's "compressed oops" optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases.  This would provide another performance and memory efficiency boost.


--
This message was sent by Atlassian JIRA
(v6.2#6252)