hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures
Date Tue, 10 Mar 2015 21:00:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355701#comment-14355701
] 

Colin Patrick McCabe commented on HDFS-7846:
--------------------------------------------

bq. Yi asked: For original BlockInfo, it includes triplets which let us easily get blocks
belonging to some DatanodeStorage. In this patch, seems not. We will use a different approach
in other JIRAs? Maybe I need to wait for HDFS-7848.

That's a good question.  I've been looking into whether we actually need to support iterating
over all the blocks on a given datanode.  I think that this requirement can be removed for
the full block report.  We can simply increment the epoch each time we get an FBR, and ignore
blocks which belong to old epochs.  In the background, we can have a scanner thread that collects
the garbage (shrinks block infos or removes block infos that are unreferenced).  This frees
up some memory which we would otherwise need to use on the implicit linked lists.  It also
makes insertion and deletion easier and slightly faster.

The places where we still need some kind of full iteration are decommissioning and running
the balancer.  However, just scanning the hash tables should be good enough in these use-cases.
 They are rare operations... they don't happen that often.  Also, they require moving blocks
around the cluster, which takes much more time than iterating over an in-memory hash table,
even when there are a lot of non-relevant slots.  We rate-limit decommissioning anyway, and
the limit is low enough that the extra overhead of full table iteration is not important.

bq. 1. In BlockMap#Builder#build, duplicate check for this.mman != null

Fixed, thanks

bq. 2. Please add InterfaceAudience and InterfaceStability to public class

ok

bq. 3. unneccessary import in new added class files

fixed

bq. Another comment is about block replicas: the replication number of file can be changed,
then we will re-allocate off-heap memory for the blocks of file? A better way is to allocate
separate off-heap memory for the replicas of block, then we just need to change that part?

This data structure is optimized for minimizing memory consumption, rather than minimizing
copying of memory when things change.  I think that's correct tradeoff to make, since the
replication factor of a file doesn't change that often.  We can also allocate space based
on the desired number of replicas, rather than the actual number.  So we don't need to re-allocate
when one replica is lost... we only re-allocate when the replication factor itself is changed.

To make this more concrete, keep in mind that with separate allocations, we'd have something
like this:
fixed size block info  -> replica info R1 ->  replica info R2 -> replica info R3

Even with a singly (not doubly) linked list, that's still an extra 8 bytes in the fixed size
block info for the pointer to R1, and an extra 8 bytes in each R for the pointer to the next.
 It also will tend to fragment the heap a lot more because we'll be making a lot of small
allocations (R1, R2, R3 will be around 24 bytes)

In contrast, having a variable sized block info saves at least 32 bytes because we have no
pointers.  It has better cache locality because we're not hopping all over memory to read
the replica list.  And copying is only needed when the replication factor is changed by {{setReplication}}
or similar.  There is less fragmentation because we make bigger and fewer calls to {{malloc}}.

bq. charles wrote: Yi mentioned unused imports, but there are also unnecessary java.lang.
{String,ClassCastException}

thanks, removed

bq. BlockId.equals: constructing a ClassCastException, and especially the resulting call to
fillInStackTrace, is an expensive way of checking the type. I would think instanceof is preferred.

This code path is never expected to actually be taken, though.  You shouldn't be comparing
BlockId objects with non-BlockId objects.  If Java 1.0 had been designed with generics, equals
would probably be {{equals(T t) where T extends MyType}}.  Anyway I used the Java interface
because creating my own version of equals seemed excessive.

bq. Are you planning on doing something with Shard.name in the future?

it will be used for logging in the future

bq. The indentation of the assignment to htable is off a bit.

it seems ok?

bq. Jenkins will ask you this question, but why no unit tests?

Fair question.  I'm trying to write this as incrementally as I can and it seemed like unit
tests could be done in a follow-on.  The really large amount of work will be integrating this
into the main NN code

bq. Oh, I forgot to mention there are three places where git apply flags the patch for adding
trailing whitespace.

ok

> Create off-heap BlocksMap and BlockData structures
> --------------------------------------------------
>
>                 Key: HDFS-7846
>                 URL: https://issues.apache.org/jira/browse/HDFS-7846
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7846-scl.001.patch
>
>
> Create off-heap BlocksMap, BlockInfo, and DataNodeInfo structures.  The BlocksMap will
use the off-heap hash table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message