hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4116) DataNode : idle rebalancing operations need not take up threads.
Date Fri, 12 Sep 2008 18:28:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630647#action_12630647

Hairong Kuang commented on HADOOP-4116:

The more close investigation of the problem shows the balancer needs additional improvements:

(1) The balancer needs to better handle block move timeout well. Currently it simply assumes
the timeouted move is failed but does not take the effort to make sure the move is interrupted
and the resources the
move takes is released. The next phase of scheduling may schedule more blocks to move from
the same DataNode thus using
more and more resources.

(2) Resource control for the balancing purpose at DataNodes should use a fair Semaphore. Currently
it uses an unfair Semaphore that makes no guarantees about the order in which threads acquire
permits. A
thread invoking acquire() can be allocated a permit ahead of a thread that has been waiting.
Therefore, if a dfs
cluster has many DataNodes that has a long queue of block move requests, it is very likely
to enter the
following state: A thread in DataNode A holding a permit and asks DataNode B to receive a
block, while DataNode B has a
thread holding a Semaphore and asking DataNode A to receive a block. Although the block move
from B to A was scheduled
much later than the move from A to B, they may be executed simultaneously. Both block receives
are blocks on acquiring
a permit assuming only one permit can be issued. Therefore, a deadlock occurs.

> DataNode : idle rebalancing operations need not take up threads.
> ----------------------------------------------------------------
>                 Key: HADOOP-4116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4116
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
> The number of threads are currently limited on datanodes. Once these threads are occupied,
DataNode does not accept any more requests (DOS). Recently we saw a case where most of the
256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire  {{balancingSem}}.
 Since rebalancing  is (heavily) throttled, I would think this would be the common case. 
> These operations waiting  for active rebalancing threads to finish need not take up a

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message