hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7844) Create an off-heap hash table implementation
Date Mon, 09 Mar 2015 22:44:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353791#comment-14353791

Colin Patrick McCabe commented on HDFS-7844:

bq. Stack wrote: High level, any notion of difference in perf when comparing native to offheap
to current implementation?

Reading off-heap memory using Unsafe#getLong is very quick.  The main overhead from off-heap
will be creating wrapper objects for things.  But those are very short-lived objects that
should never make it past the GC's young generation.  The off-heap implementation may be able
to use less memory for some things because we control the packing, which would speed things
up (since fetching memory is a big cost in the block manager).  We will see some numbers soon.

bq. If we fail to pick up the configured memory manager (or the default), its worth a WARN
log. Otherwise, folks may be confounded that they are getting the native memory manager though
they asked for something else:

We shouldn't need it because the creation of the hash table will log the name of the memory
manager and its type at INFO.

bq. This an arbitrary max? private final static long MAX_ADDRESS = 0x3fffffffffffffffL;

It's just nice because it allows the code to be provably correct.  I realize that the address
will never get there in any reasonable length of time.

bq. nit: make a method rather than dup the below...:

ok :)

bq. Is logging open at DEBUG but close at TRACE lead to confusion? Stumped debugger?

{{MemoryManager#close}} is really only a unit test thing.  But you're right, let's make it
DEBUG since the open was DEBUG.

bq. The close has to let out an IOE? What is the caller going to do w/ this IOE? The ByteArrayMemoryManager
close error string construction is same as close on ProbingHashTable?

It's a (mis)feature of {{java.io.Closeable}}.  But I use that interface anyway, since Findbugs
knows to nag us about it if we forget the close.  A user defined interface wouldn't be known
to FindBugs (although maybe there are annotations these days?)

bq. I like the compromise put upon the Iterator (that resize is allowed while Iteration...)
Seems appropriate given where this is to be deployed.

Yeah, I think it will be useful.

bq. On TestMemoryManager, maybe parameterize so once through with ByteArrayMemoryManager and
then a run with the offheap implementation rather than have dedicated test for each: https://github.com/junit-team/junit/wiki/Parameterized-tests

That's pretty cool.  I think we should do that in a follow-on where we do more coverage stuff
as well, though...

bq. Yi wrote: It's better to assert maxLoadFactor < 1 (maybe < 0.8?), incorrect value
will cause hash table failed.

Good idea

bq. \[maintainCompactness\] looks brief, but I think it's not effective. putInternal needs
probing if the slot was not in the right place, so it's not effictive.

{{putInternal}} does do probing, though.  Maybe I'm missing something but I think this should
work.  Also, I can tell from the log messages that {{maintainCompactness}} is getting some
testing.  I didn't like the original implementation because it was duplicating a lot of code
from {{putInternal}}.

bq. I think ByteArrayMemoryManager can only used for test for it's performance reason. If
SUN Unsafe is not available, we should use current implemention on Hadoop trunk. We will not
remove current implementation on trunk, right?

To my knowledge, all JVMs that are used in real hadoop clusters have access to {{sun.Unsafe}}.
 If we want to support a better on-heap memory allocator we can always work on that later.
 A more efficient on-heap implementation would be to take a big byte array and basically hand
out offsets into it much the way malloc itself does.  We're not going to keep around the old
BlockManager code because that would be impossible.

> Create an off-heap hash table implementation
> --------------------------------------------
>                 Key: HDFS-7844
>                 URL: https://issues.apache.org/jira/browse/HDFS-7844
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7844-scl.001.patch, HDFS-7844-scl.002.patch, HDFS-7844-scl.003.patch
> Create an off-heap hash table implementation.

This message was sent by Atlassian JIRA

View raw message