hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
Date Tue, 14 Jul 2015 03:03:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625742#comment-14625742

Andrew Wang commented on HDFS-8728:

Hi Zhe, I took a look at patch #3 and a few other things in the process. Nice work here, my
review comments follow (most can be deferred to post-merge follow-ons if you prefer):

* Missing Apache header on BlockInfoStriped
* Some lines longer than 80 chars in BlockInfoUnderConstructionStriped
* Need class javadoc for StripedBlockStorageOp
* TestStripedBlockUtil is included in patch 9, but StripedBlockUtil is added in patch 3

* BlockInfoUnderConstructionStriped's constructor should do a Precondition check rather than
an assert
* Should BlockInfoUC#setBlockUCState also do Precondition checks on the set value?
* I like the "op" idea to pull out shared functionality. Some of the calls look like boilerplate
though. One idea, we add an abstract protected "getOp" to BlockInfo, then isStriped, addStorage,
removeStorage, replaceBlock, and numNodes can be implemented via {{getOp().addStorage(storage,
reportedBlock)}} and so on. If we make a singleton for ContiguousBlockStorageOp, the contiguous
classes can implement getOp by returning the singleton to still have constant overhead.
* BIUCS#setTruncateBlock, shall we throw an UnsupportedOperationException rather than just
a warn? If it's not allowed, we should hard error.
* Could save a reference in StripedBlockStorageOp by making the caller pass itself when necessary;
this might actually be important since it's one per BlockInfoStriped.
* There are more complicated ideas for optimizing the memory usage of BlockInfoStriped too;
we could avoid indices entirely sometimes, or pack it into something smaller than a byte.
Those are probably follow-on.
* BlockInfo#numNodes should be renamed to BlockInfo#numStorages for accuracy, but that's not
related to this patch.

* ErasureCodingZoneManager needs to unpacks the xattr each time it needs to check the schema.
Decoding PB is not cheap. EncryptionZoneManager keeps the encryption zones in a map rather
than unpacking each time, it only needs to unpack during startup when reading the FSImage
and edits. Let's do the same thing here.
* Symlink support doesn't require anything special for an EC, I think if you just delete the
if statement in getErasureCodingZone it'll work.
* SchemaLoader, is this class used anywhere? I didn't see any usages.
* SchemaLoader should also be made a static nested class, e.g. SchemaLoader.Loader, since
there's no state.

* SequentialBlockGroupIdGenerator lacks unit tests. Also need to cross-reference with SequentialBlockIdGenerator's

> Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
> -------------------------------------------------------------------
>                 Key: HDFS-8728
>                 URL: https://issues.apache.org/jira/browse/HDFS-8728
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8728-HDFS-7285.00.patch, HDFS-8728-HDFS-7285.01.patch, HDFS-8728.00.patch,
HDFS-8728.01.patch, HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch,
Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch,
Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch

This message was sent by Atlassian JIRA

View raw message