hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4971) Block report times from datanodes could converge to same time.
Date Thu, 08 Jan 2009 00:44:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661796#action_12661796
] 

Raghu Angadi commented on HADOOP-4971:
--------------------------------------

test-patch : 

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified
tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.



> Block report times from datanodes could converge to same time.   
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4971
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4971
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4971-branch-18.patch, HADOOP-4971.patch, HADOOP-4971.patch
>
>
> Datanode block reports take quite a bit of memory to process at the namenode. After the
inital report, DNs pick a random time to spread this load across at the NN. This normally
works fine. 
> Block reports are sent inside "offerService()" thread in DN. If for some reason this
thread was stuck for long time (comparable to block report interval), and same thing happens
on many DNs, all of them get back to the loop at the same time and start sending block report
then and every hour at the same time. 
> RPC server and clients in 0.18 can handle this situation fine. But since this is a memory
intensive RPC it lead to large GC delays at the NN. We don't know yet why offerService therads
seemed to be stuck, but DN should re-randomize it block report time in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message