hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4116) Balancer should provide better resource management
Date Sat, 20 Sep 2008 01:09:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632906#action_12632906
] 

Konstantin Shvachko commented on HADOOP-4116:
---------------------------------------------

- I do not understand what was the reason for changing the push block model to the pull model.
Previously balancer would send OP_COPY_BLOCK request to the proxy node, which then sent OP_REPLACE_BLOCK
to the target.
Now it is reversed: the balancer sends OP_REPLACE_BLOCK to the target, which sends OP_COPY_BLOCK
to the proxy.
In both cases DataXceiver is supposed to check the XceiverCount and throw an exception if
it is exceeded.
So in both cases the transfer will fail if any of the two data-nodes are busy.
I don't see a mistake here, but don't see a reason for this rather radical change either,
may be I am missing something.
- BalanceManager 
-- should probably be called {{BlockBalanceThrottler}}.
-- It makes sense to derive it from {{BlockTransferThrottler}}.
-- It should be a static class.
-- And it should rather be a member of {{DataXceiveServer}} than {{DataNode}}.

> Balancer should provide better resource management
> --------------------------------------------------
>
>                 Key: HADOOP-4116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4116
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.2, 0.19.0
>
>         Attachments: balancerRM.patch
>
>
> The number of threads are currently limited on datanodes. Once these threads are occupied,
DataNode does not accept any more requests (DOS). Recently we saw a case where most of the
256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire  {{balancingSem}}.
 Since rebalancing  is (heavily) throttled, I would think this would be the common case. 
> These operations waiting  for active rebalancing threads to finish need not take up a
thread. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message