hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets
Date Mon, 02 Feb 2009 23:33:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669800#action_12669800

Konstantin Shvachko commented on HADOOP-5124:

# {{computeInvalidateWork()}}
## You probably want to use {{Math.min()}} in computing the value of {{nodesToProcess}}
## I would rather go with 
{{ArrayList<String> keyArray = new ArrayList<String>(recentInvalidateSets.keySet());}}

than {{String[] keyArray}}. You will be able to use {{Collections.swap()}} instead of implementing
it yourself.
Ideally it would be better of course to just get a random element from the TreeMap and put
it into the array list.
# {{invalidateWorkForOneNode()}}
      recentInvalidateSets.put(firstNodeId, invalidateSet);
Is a no op in your case, because {{recentInvalidateSets}} already contains {{firstNodeId}}
with exactly {{invalidateSet}} as it was modified before in the loop.
The original variant of this code
makes more sense since we remove the entire node if it does not have invalid blocks anymore.
# Could you please run some tests showing how much of optimization we can get with the randomization
of data-node selection.

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map.
Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a
predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides
fairness to all datanodes. The current strategy always starts from the first datanode.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message