hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7967) Reduce the performance impact of the balancer
Date Fri, 20 Mar 2015 23:22:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372305#comment-14372305
] 

Daryn Sharp commented on HDFS-7967:
-----------------------------------

The current implementation is so bad that on large clusters we have to restrict the balancer
to using only one thread for block queries.  Multiple threads will destroy the performance
of busy namenodes by causing call queue overflows.

> Reduce the performance impact of the balancer
> ---------------------------------------------
>
>                 Key: HDFS-7967
>                 URL: https://issues.apache.org/jira/browse/HDFS-7967
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>
> The balancer needs to query for blocks to move from overly full DNs.  The block lookup
is extremely inefficient.  An iterator of the node's blocks is created from the iterators
of its storages' blocks.  A random number is chosen corresponding to how many blocks will
be skipped via the iterator.  Each skip requires costly scanning of triplets.
> The current design also only considers node imbalances while ignoring imbalances within
the nodes's storages.  A more efficient and intelligent design may eliminate the costly skipping
of blocks via round-robin selection of blocks from the storages based on remaining capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message