hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Torsten Curdt (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1845) Datanodes get error message "is valid, and cannot be written to"
Date Fri, 07 Sep 2007 07:57:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525633
] 

Torsten Curdt commented on HADOOP-1845:
---------------------------------------

We can try to gather a (new) log but if I remember even the unit test had the exception. (see
the thread on dev) With my still limited knowledge of the codebase and comparing to what is
going on the cluster it seems it does not really cause huge problems. So it was mainly disconcerting.
But - at some stage we have had so many of these exception in the logs that they were essentially
spamming the logs. (More than hundred exceptions per minute!) ...which made it a bit more
than just disconcerting. Maybe the word "panic" was more suited ;) At least OPS stopped believing
this is not a problem and doesn't add to a good reputation of hadoop. It also make you lose
the eye for real problems as it just drowns them in a see of information.

So yeah ...we should really fix this.

> Datanodes get error message "is valid, and cannot be written to" 
> -----------------------------------------------------------------
>
>                 Key: HADOOP-1845
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1845
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>             Fix For: 0.15.0
>
>
> >> Copy from dev list:
> Our cluster has 4 nodes and i set the mapred.subimt.replication parameter to 2 on all
nodes and the master. Everything has been restarted.
> Unfortuantely, we still have the same exception :
> 2007-09-05 17:01:59,623 ERROR org.apache.hadoop.dfs.DataNode:
> DataXceiver: java.io.IOException: Block blk_-5969983648201186681 is valid, and cannot
be written to.
>         at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:515)
>         at
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:822)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
>         at java.lang.Thread.run(Thread.java:595)
> >> end of copy
> The message shows that the namenode schedules to replicate a block to a datanode that
already holds the block. The namenode block placement algorithm makes sure that it does not
schedule a block to a datanode that is confirmed to hold a replica of the block. But it is
not aware of any in-transit  block placements (i.e. the scheduled but not confirmed block
placements), so occasionally we may still see "is valid, and cannot be written to" errors.
> A fix to the problem is to keep track of all in-transit block  placements, and the block
placement algorithm considers these  to-be-confirmed replicas as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message