hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haryadi Gunawi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1765) Block Replication should respect under-replication block priority
Date Mon, 20 Jun 2011 17:18:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052086#comment-13052086
] 

Haryadi Gunawi commented on HDFS-1765:
--------------------------------------

I agree with Hairong. Recently, I've been playing around with this, and found the same problem
as shown in the attachment (underReplicatedQueue.pdf).

At a high-level, if the round-robin iterator is in queue-2 (queue with priority=2), then the
UR blocks in queue-0 must wait until the iterator wraps to queue-0 again.  So, I assume, in
worst case, if queue-2 is long (as depicted in the graph), the UR blocks in queue-0 will take
a very long time to be served!

The setup of the figure:
I have 20 nodes.  Each node holds 3000 blocks. I fail 4 nodes.
q-0: UR blocks with 1 replica
q-2: UR blocks with 2 replicas
pq: pending queue
(I stopped the experiment in the middle, because the pattern is obvious)

More details why the round-robin iterator does not work:

It is true that round-robin iterates through queue-0 first,
but the replication monitor runs this logic:
- choose a block B to be replicated
- pick a source node S that still has B 
- BUT if S were already chosen to replicate other blocks 
  (i.e. S' rep stream is already larger than the maxrepstream(2)),
  then increment the iterator (and thus this block B in queue-0
  will not be served until the round-robin iterator wraps).
  And if other queues (e.g. q1 and q2) are super long, then queue-0
  might be starved for a long time.



> Block Replication should respect under-replication block priority
> -----------------------------------------------------------------
>
>                 Key: HDFS-1765
>                 URL: https://issues.apache.org/jira/browse/HDFS-1765
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.23.0
>
>
> Currently under-replicated blocks are assigned different priorities depending on how
many replicas a block has. However the replication monitor works on blocks in a round-robin
fashion. So the newly added high priority blocks won't get replicated until all low-priority
blocks are done. One example is that on decommissioning datanode WebUI we often observe that
"blocks with only decommissioning replicas" do not get scheduled to replicate before other
blocks, so risking data availability if the node is shutdown for repair before decommission
completes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message