hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-285) Data nodes cannot re-join the cluster once connection is lost
Date Wed, 07 Jun 2006 17:28:31 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-285?page=all ]

Hairong Kuang updated HADOOP-285:

    Attachment: datanode.patch

This patch starts the data receiver thread in "run" instead of "offerservice". So it will
not be restarted after a connect is lost.

> Data nodes cannot re-join the cluster once connection is lost
> -------------------------------------------------------------
>          Key: HADOOP-285
>          URL: http://issues.apache.org/jira/browse/HADOOP-285
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.3.0
>     Reporter: Konstantin Shvachko
>     Assignee: Hairong Kuang
>  Attachments: datanode.patch
> A data node looses connection to a name node and then tries to offerService() again.
> HADOOP-270 changes force it to start dataXceiveServer, which is already started and in
this case
> throws IllegalThreadStateException, which goes on in a loop, and never reaches the heartbeat
> So the data node never re-joins the cluster, while from the out side it looks it's still
> This is another reason why we see missing data, and don't see failed data nodes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message