hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-9540) Handle SafeModeException in ReportBadBlockAction#reportTo
Date Fri, 11 Dec 2015 16:13:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052944#comment-15052944
] 

Vinayakumar B edited comment on HDFS-9540 at 12/11/15 4:12 PM:
---------------------------------------------------------------

{code}  /**
   * Client is reporting some bad block locations.
   */
  void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
    checkOperation(OperationCategory.WRITE);
    NameNode.stateChangeLog.info("*DIR* reportBadBlocks");
    writeLock();
    try {
      checkOperation(OperationCategory.WRITE);
      for (int i = 0; i < blocks.length; i++) {
        ExtendedBlock blk = blocks[i].getBlock();
        DatanodeInfo[] nodes = blocks[i].getLocations();
        String[] storageIDs = blocks[i].getStorageIDs();
        for (int j = 0; j < nodes.length; j++) {
          blockManager.findAndMarkBlockAsCorrupt(blk, nodes[j],
              storageIDs == null ? null: storageIDs[j], 
              "client machine reported it");
        }
      }
    } finally {
      writeUnlock();
    }
  }{code}

According to above code snippet from FSNamesystem responsible for handling bad blocks, there
is no safemode check. And hence No SafeModeException expected.
Whether check is really required? IMO, NO, since this is not any update on filesystem, but
just marking the block as corrupt. 

Agree [~kihwal]  and [~yzhangal] ?


was (Author: vinayrpet):
{code}  /**
   * Client is reporting some bad block locations.
   */
  void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
    checkOperation(OperationCategory.WRITE);
    NameNode.stateChangeLog.info("*DIR* reportBadBlocks");
    writeLock();
    try {
      checkOperation(OperationCategory.WRITE);
      for (int i = 0; i < blocks.length; i++) {
        ExtendedBlock blk = blocks[i].getBlock();
        DatanodeInfo[] nodes = blocks[i].getLocations();
        String[] storageIDs = blocks[i].getStorageIDs();
        for (int j = 0; j < nodes.length; j++) {
          blockManager.findAndMarkBlockAsCorrupt(blk, nodes[j],
              storageIDs == null ? null: storageIDs[j], 
              "client machine reported it");
        }
      }
    } finally {
      writeUnlock();
    }
  }{code}

According to above code snippet from FSNamesystem responsible for handling bad blocks, there
is no safemode check. And hence No SafeModeException expected.
Whether check is really required? IMO, since this is not any update on filesystem, but just
marking the block as corrupt. 

Agree [~kihwal]  and [~yzhangal] ?

> Handle SafeModeException in ReportBadBlockAction#reportTo
> ---------------------------------------------------------
>
>                 Key: HDFS-9540
>                 URL: https://issues.apache.org/jira/browse/HDFS-9540
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> BPServiceActor#processQueueMessages() tries to execute the ReportBadBlockAction#reportTo(..)
and on any exception, it will add back to queue. 
> For StandbyExceptoin, this caused HDFS-7916, that a request kept being added back to
the queue while it should not. 
> HDFS-7916 solution treats all exceptions wrapped by RemoteException the same, including
StandbyException. That is, when RemoteException is caught, the request is not added back to
the queue.
> This solved the StandbyException issue reported in HDFS-7916, but the side effect is,
it does not add the request back to the queue for SafeModeException wrapper by RemoteException,
which appears to be incorrect.
> Thanks [~vinayrpet] and [~kihwal] for the discussion in HDFS-7916.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message