hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2606) Namenode unstable when replicating 500k blocks at once
Date Sat, 15 Mar 2008 16:22:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579060#action_12579060

Hadoop QA commented on HADOOP-2606:

-1 overall.  Here are the results of testing the latest attachment 
against trunk revision 619744.

    @author +1.  The patch does not contain any @author tags.

    tests included +1.  The patch appears to include 3 new or modified tests.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests -1.  The patch failed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1973/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1973/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1973/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1973/console

This message is automatically generated.

> Namenode unstable when replicating 500k blocks at once
> ------------------------------------------------------
>                 Key: HADOOP-2606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2606
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.3
>            Reporter: Koji Noguchi
>            Assignee: Konstantin Shvachko
>             Fix For: 0.17.0
>         Attachments: ReplicatorNew.patch, ReplicatorTestOld.patch
> We tried to decommission about 40 nodes at once, each containing 12k blocks. (about 500k
> (This also happened when we first tried to decommission 2 million blocks)
> Clients started experiencing  "java.lang.RuntimeException: java.net.SocketTimeoutException:
timed out waiting for rpc
> response" and namenode was in 100% cpu state. 
> It was spending most of its time on one thread, 
> "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@7f401d28" daemon prio=10 tid=0x0000002e10702800
> runnable [0x0000000041a42000..0x0000000041a42a30]
>    java.lang.Thread.State: RUNNABLE
>         at org.apache.hadoop.dfs.FSNamesystem.containingNodeList(FSNamesystem.java:2766)
>         at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2870)
>         - locked <0x0000002aa3cef720> (a org.apache.hadoop.dfs.UnderReplicatedBlocks)
>         - locked <0x0000002aa3c42e28> (a org.apache.hadoop.dfs.FSNamesystem)
>         at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1928)
>         at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1868)
>         at java.lang.Thread.run(Thread.java:619)
> We confirmed that Namenode was not in the fullGC states when these problem happened.
> Also, dfsadmin -metasave was showing "Blocks waiting for replication" was decreasing
very slowly.
> I believe this is not specific to decommission and same problem would happen if we lose
one rack.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message