hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1765) Block Replication should respect under-replication block priority
Date Thu, 08 Dec 2011 19:12:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165445#comment-13165445

Uma Maheswara Rao G commented on HDFS-1765:

Thanks a lot Eli for the nice review.

I have updated the patch with the comments addressed.


•Why do we have to hold the write lock across chooseUnderReplicatedBlocks and computeReplicationWorkForBlocks
now? Seems like the current granularity is sufficient.
   I got this suspect when i am working on this. But that synchronization if not introduced
as part of this JIRA.
   I will file separate JIRA for it?
•computeReplicationWork writeUnlock needs to be in a finaly clause
  Addressed as part of my above comment. Thanks for noticing.


•I'd make chooseUnderReplicatedBlocks synchronized instead of synchronizing on this
  There are some code which is not required of synchronization. But anyway there is no big
benifit because of that small loop before. Just made it to method level Synchronization.

•Why reset all repl indices at the same time? Shouldn't we reset the repl index for a priority
only when that given priority runs out?
   This is because, if any of the queue blocks are not able to clear, then that queues replication
index will be at end. If we reset immeditely here. On next ReplicationInterval, again the
same blocks will be iterated , even though are blocks in next priority blocks not visited

    take a case here:
       priority 3 has 20 blocks(which can not be processed for replication work) and 4 has
10 blocks. blocksToProcess is 10.
       1st replicationInterval : it will take 10 blocks from priority 3. (assume the block
can not find targets to replicate, so the replIndex will not be decremented)
       2nd replicationInterval : it will take 10 blocks from priority 3 again. Assume same
case like #1. 
          If we reset replIndex here immediately, on 3rd replicationInterval, again it will
steart visiting the blocks in 3rd priority. It will not go to 4th priority.

•I'd populate priorityVsReplIndex in the 1st loop so you don't need to check gets for null
when setting repleIndex
  done in UnderReplicatedBlocks constructor.

•I'd make the loop over blocks a while loop, eg 
while (blockCount < blocksToProcess && neededRepls.hasNext()) {

•"All priority Queues last block index used for replication work." -> "Stores the replication
index for each priority"
•Rename "priorityVsReplIdx" to "priorityToReplIdx" and "repleIndex" to "replIndex"

•param javadocs for decrementReplicationIndex and chooseUnderReplicatedBlocks should be
on their own line
  Done. This is due to eclipse shortcut.

•I'd remove the comment about "# of blocks to process equals either twice the number of
live data-nodes" as its a function of the argument blocksToProcess and so this comment is
stale as soon as eg REPLICATION_WORK_MULTIPLIER_PER_ITERATION (which is multiple callers up)
  sorry, removed the comment.

•I'd rename "replicationPriority" just "priority"

•!neededRep.hasNext() instead of "false == neededRep.hasNext()"
•We don't need to qualify the code in chooseUnderReplicatedBlocks with UnderReplicatedBlocks
anymore (eg LEVEL and BlockIterator) now that it lives in this class

•The warn of unexpected replication priority should be an assert right? The block should
never have an invalid priority.
     Since there is no intermediate priority changes, need not have this check now. Loop 
and iterater is specific to priority. They will never mismatch.


•Seems like the test needs to check that there are no blocks in the high priority queue
but there are still blocks in the low priority queue. It currently just tests that the high
priority queue is empty, but that could be because it's has sufficient time to empty all queues
 Now aound after ~7 blocks, high priority block got processed. to complete 100 blocks it will
take 25secs.  
•Maybe write a unit test directly against chooseUnderReplicatedBlocks?
  Added the testcases directly with chooseUnderReplicatedBlocks.

•Nit: replace 3000 with a constant and set DFS_NAMENODE_REPLICATION_INTERVAL_KEY in the
conf in the test (you could lower it to 1000 so the test runs more quickly).

•Can import static org.junit.Assert.*
  Done. This is becuase of eclipse shortcut.
•"replicationinterval" is two words 
•"high priority blocks to process very quickly than the low priority blocks" -> "high
priority blocks are processed before the low priority blocks"


•Where you've replaced updateMetrics with the new code/comment to sleep how about putting
this in a method (eg waitForDeletion) and calling that.
  Sure. Can do it.... Done.
> Block Replication should respect under-replication block priority
> -----------------------------------------------------------------
>                 Key: HDFS-1765
>                 URL: https://issues.apache.org/jira/browse/HDFS-1765
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Hairong Kuang
>            Assignee: Uma Maheswara Rao G
>             Fix For: 0.24.0
>         Attachments: HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.pdf,
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> Currently under-replicated blocks are assigned different priorities depending on how
many replicas a block has. However the replication monitor works on blocks in a round-robin
fashion. So the newly added high priority blocks won't get replicated until all low-priority
blocks are done. One example is that on decommissioning datanode WebUI we often observe that
"blocks with only decommissioning replicas" do not get scheduled to replicate before other
blocks, so risking data availability if the node is shutdown for repair before decommission

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message