hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1486) ReplicationMonitor thread goes away
Date Thu, 28 Jun 2007 19:15:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508912
] 

Doug Cutting commented on HADOOP-1486:
--------------------------------------

> If we create a new instance of the NameNode within the same JVM, then the GC process
might take a while before the memory situation stabilizes.

That's possible, I suppose, it's also possible that the GC might handle this well.  GC time
is often proportional to the amount of non-garbage, which would be small on restart.

> Is it ok if I exit the namenode-jvm completely and leave it to the administrator to restart
the namenode if necessary?

Sure, that'd be okay.  But, if the namenode auto-restarts slowly, the admin can always kill
& restart it manually, so I don't see the harm in it attempting to auto-restart.  Restarting
slowly isn't worse than being down, is it?  So my instinct would be to try auto-restarting.
 It shouldn't cause data loss, and might indeed help in many cases, so, why not?

> ReplicationMonitor thread goes away 
> ------------------------------------
>
>                 Key: HADOOP-1486
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1486
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Koji Noguchi
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: catchThrowable2.patch
>
>
> Saw many over/under replicated blocks in fsck output.
> .out file showed
> Exception in thread "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@2785982c"
java.lang.IllegalArgumentException: Unexpected non-existing data node: /99.9.99.0/99.9.99.42:99999
>   at org.apache.hadoop.net.NetworkTopology.checkArgument(NetworkTopology.java:379)
>   at org.apache.hadoop.net.NetworkTopology.isOnSameRack(NetworkTopology.java:424)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2853)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2816)
>   at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2658)
>   at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1774)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1723)
>   at java.lang.Thread.run(Thread.java:619)
> (same as HADOOP-1232)
> And, jstack showed no ReplicationMonitor thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message