hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
Date Tue, 28 Jul 2015 19:28:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644896#comment-14644896
] 

Haohui Mai commented on HDFS-8823:
----------------------------------

I should have word it more clear. The main motivation of this work is to further separate
the block management layer and the namespace. It is a prerequisite to put the block management
layer under a separated lock so that processing the block reports will no longer block the
namespace operations.

As a side effect the changes potentially enable per-block replication factor. However, there
are no plans to support it, nor any plans to make it visible to APIs.

Speaking of the memory usage, here are the outputs of the object layout of the {{BlockInfo}}
class before and after the changes:

Before:

{noformat}
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

VM fails to invoke the default constructor, falling back to class-only introspection.

BlockInfo object internals:
 OFFSET  SIZE            TYPE DESCRIPTION                    VALUE
      0    12                 (object header)                N/A
     12     4                 (alignment/padding gap)        N/A
     16     8            long Block.blockId                  N/A
     24     8            long Block.numBytes                 N/A
     32     8            long Block.generationStamp          N/A
     40     4 BlockCollection BlockInfo.bc                   N/A
     44     4   LinkedElement BlockInfo.nextLinkedElement    N/A
     48     4        Object[] BlockInfo.triplets             N/A
     52     4                 (loss due to the next object alignment)
Instance size: 56 bytes (estimated, the sample instance is not available)
Space losses: 4 bytes internal + 4 bytes external = 8 bytes total
{noformat}

After:

{noformat}
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

VM fails to invoke the default constructor, falling back to class-only introspection.

objc[86584]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/bin/java
and /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/libinstrument.dylib.
One of the two will be used. Which one is undefined.
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo object internals:
 OFFSET  SIZE            TYPE DESCRIPTION                    VALUE
      0    12                 (object header)                N/A
     12     4                 (alignment/padding gap)        N/A
     16     8            long Block.blockId                  N/A
     24     8            long Block.numBytes                 N/A
     32     8            long Block.generationStamp          N/A
     40     2           short BlockInfo.replication          N/A
     42     2                 (alignment/padding gap)        N/A
     44     4 BlockCollection BlockInfo.bc                   N/A
     48     4   LinkedElement BlockInfo.nextLinkedElement    N/A
     52     4        Object[] BlockInfo.triplets             N/A
Instance size: 56 bytes (estimated, the sample instance is not available)
Space losses: 6 bytes internal + 0 bytes external = 6 bytes total
{noformat}

The changes add a short into the {{BlockInfo}} class. Under my configuration (Java 1.8) the
space overhead is absorbed by the alignment of the class, which means there is no memory overhead
compared to the current implementation. YMMV.

> Move replication factor into individual blocks
> ----------------------------------------------
>
>                 Key: HDFS-8823
>                 URL: https://issues.apache.org/jira/browse/HDFS-8823
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: HDFS-8823.000.patch
>
>
> This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes
have two advantages:
> * Decoupling the namespace and the block management layer. It is a prerequisite step
to move block management off the heap or to a separate process.
> * Increased flexibility on replicating blocks. Currently the replication factors of all
blocks have to be the same. The replication factors of these blocks are equal to the highest
replication factor across all snapshots. The changes will allow blocks in a file to have different
replication factor, potentially saving some space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message