hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1486) ReplicationMonitor thread goes away
Date Wed, 20 Jun 2007 21:20:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506680
] 

Hairong Kuang commented on HADOOP-1486:
---------------------------------------

> The ReplicationMonitor thread catches all types of exceptions, logs them, sleep for 5
seconds and then continue from the beginning. 

This solution makes sure that ReplicationMonitor does not go away in case of RuntimeErrors.
But is it possible that this solution leaves namenode in an inconsistent state? What if ReplicationMonitor
is in the middle of updating some data structures when RuntimeError occurs. If this is possible,
option 1 might be a safer solution. 

> ReplicationMonitor thread goes away 
> ------------------------------------
>
>                 Key: HADOOP-1486
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1486
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: catchThrowable.patch
>
>
> Saw many over/under replicated blocks in fsck output.
> .out file showed
> Exception in thread "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@2785982c"
java.lang.IllegalArgumentException: Unexpected non-existing data node: /99.9.99.0/99.9.99.42:99999
>   at org.apache.hadoop.net.NetworkTopology.checkArgument(NetworkTopology.java:379)
>   at org.apache.hadoop.net.NetworkTopology.isOnSameRack(NetworkTopology.java:424)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2853)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2816)
>   at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2658)
>   at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1774)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1723)
>   at java.lang.Thread.run(Thread.java:619)
> (same as HADOOP-1232)
> And, jstack showed no ReplicationMonitor thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message