hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5777) ResolutionMonitor dies on an exception
Date Thu, 07 May 2009 19:38:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707061#action_12707061
] 

Jakob Homan commented on HADOOP-5777:
-------------------------------------

Hairong and I determined the issue was caused by a race condition created by having lots of
nodes with the same storage ID registering at the same time (due to being from cloned drives,
not something that should normally happen), and the ResolutionMonitor not being properly synchronized.
 The network location for a particular node is reset to UNRESOLVED (empty string, "") before
being passed to add, which causes the substring to fail.

Since the ResolutionMonitor is now removed, it's not worth fixing it and will close as won't
fix.

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>            Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero
bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception.
Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String
index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message