hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
Date Wed, 07 Mar 2018 18:35:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389759#comment-16389759
] 

Uma Maheswara Rao G edited comment on HDFS-13165 at 3/7/18 6:34 PM:
--------------------------------------------------------------------

[~rakeshr] sorry for the delay in reviews. Was in travels last month, could not get time.

Thank you for all the hard work. Please find my initial feedback.

1.
{code:java}
-      // To avoid choosing this excludeNodes as targets later
-      excludeNodes.add(existingTypeNodePair.dn);
     }
{code}
 Why not adding to exclude nodes?

2.
{code:java}
 if (dnInfo.size() <= 0) {
          movementFinishedBlocks.add(reportedBlock);
        } 
{code}
I think you should remove block from scheduledBlkLocs as well? Otherwise it will be leaked?

 3. For starting time, if SPS is none, should we even need to create SPSManager object?

    If no, need to check spa manager null checks.

 4. not related to this patch:
{code:java}
 // Check that SPS daemon service is running inside namenode
    if (namesystem.getBlockManager().getSPSManager()
        .getMode() == StoragePolicySatisfierMode.INTERNAL) {
      LOG.debug("SPS service is internally enabled and running inside "
          + "namenode, so external SPS is not allowed to fetch the path Ids");
      throw new IOException("SPS service is internally enabled and running"
          + " inside namenode, so external SPS is not allowed to fetch"
          + " the path Ids");
    }
{code}
  I think you should just make sure mode external, otherwise throw exception?

  But now, seems we will return if mode is none?


was (Author: umamaheswararao):
[~rakeshr] sorry for the delay in reviews. Was in travels last month, could not get time.

Thank you for all the hard work. Please find my initial feedback.

1.

 
{code:java}
-      // To avoid choosing this excludeNodes as targets later
-      excludeNodes.add(existingTypeNodePair.dn);
     }
{code}
 

Why not adding to exclude nodes?

2.
{code:java}
 if (dnInfo.size() <= 0) {
          movementFinishedBlocks.add(reportedBlock);
        } 
{code}
I think you should remove block from scheduledBlkLocs as well? Otherwise it will be leaked?

 3. For starting time, if SPS is none, should we even need to create SPSManager object?

    If no, need to check spa manager null checks.

 4. not related to this patch:

 
{code:java}
 // Check that SPS daemon service is running inside namenode
    if (namesystem.getBlockManager().getSPSManager()
        .getMode() == StoragePolicySatisfierMode.INTERNAL) {
      LOG.debug("SPS service is internally enabled and running inside "
          + "namenode, so external SPS is not allowed to fetch the path Ids");
      throw new IOException("SPS service is internally enabled and running"
          + " inside namenode, so external SPS is not allowed to fetch"
          + " the path Ids");
    }
{code}
 

 I think you should just make sure mode external, otherwise throw exception?

But now, seems we will return if mode is none?

> [SPS]: Collects successfully moved block details via IBR
> --------------------------------------------------------
>
>                 Key: HDFS-13165
>                 URL: https://issues.apache.org/jira/browse/HDFS-13165
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>         Attachments: HDFS-13165-HDFS-10285-00.patch, HDFS-13165-HDFS-10285-01.patch,
HDFS-13165-HDFS-10285-02.patch, HDFS-13165-HDFS-10285-03.patch
>
>
> This task to make use of the existing IBR to get moved block details and remove unwanted
future tracking logic exists in BlockStorageMovementTracker code, this is no more needed as
the file level tracking maintained at NN itself.
> Following comments taken from HDFS-10285, [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]
> Comment-3)
> {quote}BPServiceActor
> Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}
> Comment-21)
> {quote}
> BlockStorageMovementTracker
> Many data structures are riddled with non-threadsafe race conditions and risk of CMEs.
> Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's list of futures
is synchronized. However the run loop does an unsynchronized block get, unsynchronized future
remove, unsynchronized isEmpty, possibly another unsynchronized get, only then does it do
a synchronized remove of the block. The whole chunk of code should be synchronized.
> Is the problematic moverTaskFutures even needed? It's aggregating futures per-block for
seemingly no reason. Why track all the futures at all instead of just relying on the completion
service? As best I can tell:
> It's only used to determine if a future from the completion service should be ignored
during shutdown. Shutdown sets the running boolean to false and clears the entire datastructure
so why not use the running boolean like a check just a little further down?
> As synchronization to sleep up to 2 seconds before performing a blocking moverCompletionService.take,
but only when it thinks there are no active futures. I'll ignore the missed notify race that
the bounded wait masks, but the real question is why not just do the blocking take?
> Why all the complexity? Am I missing something?
> BlocksMovementsStatusHandler
> Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. blockIdVsMovementStatus
is inconsistent synchronized. Does synchronize to return an unmodifiable list which sadly
does nothing to protect the caller from CME.
> handle is iterating over a non-thread safe list.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message