hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-659) Boost the priority of re-replicating blocks that are far from their replication target
Date Thu, 02 Nov 2006 19:45:18 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-659?page=comments#action_12446706 ] 
            
Konstantin Shvachko commented on HADOOP-659:
--------------------------------------------

If you mean the total order by the number of copies remaining
the problem is with the second case.
We have 3 healthy copies of the block and we need 10.
The total order does not boost priority of these blocks.
If you mean the total order by the number of copies missing
then we do not specifically focus on blocks having 1 remaining copy.

The main disadvantage of any total order would be maintenance costs.
Having 2 groups means that I should place nodes either in the beginning
of the list or in the end O(1).
Having total order will require log n operations for the access, and you
cannot use hash since the number of collisions is expected to be large.

> Boost the priority of re-replicating blocks that are far from their replication target
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-659
>                 URL: http://issues.apache.org/jira/browse/HADOOP-659
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.7.2
>            Reporter: Konstantin Shvachko
>         Assigned To: Konstantin Shvachko
>
> I see two types of replications that should be accelerated compared to all others.
> 1. Blocks that have only one remaining copy (but are required to have higher replication).
> 2. Blocks that have less than 1/3 of their replicas in place.
> The latter occurs when map/reduce sets replication of certain files to 10, and we want
> it happen fast to achieve better performance on the tasks.
> So I think we should distinguish two major groups of under-replicated blocks:
> first-priority (having only 1 copy or less than 1/3 of required replicas), and the rest.
> The name-node places first-priority blocks into the beginning of the neededReplication
> list, and the rest are placed at the end. That way the first-priority blocks will be
replicated
> first and then the others.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message