hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1845) Datanodes get error message "is valid, and cannot be written to"
Date Tue, 25 Sep 2007 23:29:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530323
] 

Konstantin Shvachko commented on HADOOP-1845:
---------------------------------------------

HADOOP-1946 is the main reason for that behavior.
The data-nodes were very slow so that blockReceived()  were in fact coming late after the
file had been closed.
Do we still want to remove the exception mentioned in the initial description?


> Datanodes get error message "is valid, and cannot be written to" 
> -----------------------------------------------------------------
>
>                 Key: HADOOP-1845
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1845
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>             Fix For: 0.15.0
>
>
> >> Copy from dev list:
> Our cluster has 4 nodes and i set the mapred.subimt.replication parameter to 2 on all
nodes and the master. Everything has been restarted.
> Unfortuantely, we still have the same exception :
> 2007-09-05 17:01:59,623 ERROR org.apache.hadoop.dfs.DataNode:
> DataXceiver: java.io.IOException: Block blk_-5969983648201186681 is valid, and cannot
be written to.
>         at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:515)
>         at
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:822)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
>         at java.lang.Thread.run(Thread.java:595)
> >> end of copy
> The message shows that the namenode schedules to replicate a block to a datanode that
already holds the block. The namenode block placement algorithm makes sure that it does not
schedule a block to a datanode that is confirmed to hold a replica of the block. But it is
not aware of any in-transit  block placements (i.e. the scheduled but not confirmed block
placements), so occasionally we may still see "is valid, and cannot be written to" errors.
> A fix to the problem is to keep track of all in-transit block  placements, and the block
placement algorithm considers these  to-be-confirmed replicas as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message