hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9837) BlockManager#countNodes should be able to detect duplicated internal blocks
Date Tue, 23 Feb 2016 15:02:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159001#comment-15159001

Rakesh R commented on HDFS-9837:

Thanks [~jingzhao] for the work. {{EnumCounters<StoredReplicaState>}} idea looks pretty
good. I've few comments about the changes, please take a look at it.

# Should be StoredReplicaState.DECOMMISSIONED, isn't it?
       } else if (node.isDecommissioned()) {
-        decommissioned++;
+        counters.add(StoredReplicaState.DECOMMISSIONING, 1);
# I could see BlockInfoStriped#findSlot() is expanding the capacity beyond {{#getTotalBlockNum}}.
I think while counting the replicas it is checking only upto totalBlock, so it will miss few
replica checks.

+  private void countReplicasForStripedBlock(
+      EnumCounters<StoredReplicaState> counters, BlockInfoStriped block,
+      Collection<DatanodeDescriptor> nodesCorrupt, boolean inStartupSafeMode) {
+    BitSet bitSet = new BitSet(block.getTotalBlockNum());

  private int findSlot() {
    int i = getTotalBlockNum();
    for (; i < getCapacity(); i++) {
      if (getStorageInfo(i) == null) {
        return i;
    // need to expand the storage size
    ensureCapacity(i + 1, true);
    return i;
# I'd prefer using {{#getCapacity()}} instead of {{storages.length}}. Also, use {{#getStorageInfo(index)}}
instead of 
storages[index] and storages[i]

> BlockManager#countNodes should be able to detect duplicated internal blocks
> ---------------------------------------------------------------------------
>                 Key: HDFS-9837
>                 URL: https://issues.apache.org/jira/browse/HDFS-9837
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-9837.000.patch, HDFS-9837.001.patch, HDFS-9837.002.patch
> Currently {{BlockManager#countNodes}} only counts the number of replicas/internal blocks
thus it cannot detect the under-replicated scenario where a striped EC block has 9 internal
blocks but contains duplicated data/parity blocks. E.g., b8 is missing while 2 b0 exist:
> b0, b1, b2, b3, b4, b5, b6, b7, b0
> If the NameNode keeps running, NN is able to detect the duplication of b0 and will put
the block into the excess map. {{countNodes}} excludes internal blocks captured in the excess
map thus can return the correct number of live replicas. However, if NN restarts before sending
out the reconstruction command, the missing internal block cannot be detected anymore. The
following steps can reproduce the issue:
> # create an EC file
> # kill DN1 and wait for the reconstruction to happen
> # start DN1 again
> # kill DN2 and restart NN immediately

This message was sent by Atlassian JIRA

View raw message