hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Assigned: (HADOOP-2763) Replication Monitor timing out repeatedly
Date Fri, 01 Feb 2008 18:43:11 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Chansler reassigned HADOOP-2763:
---------------------------------------

    Assignee: Tsz Wo (Nicholas), SZE

> Replication Monitor timing out repeatedly
> -----------------------------------------
>
>                 Key: HADOOP-2763
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2763
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>         Environment: Jan 28 nightly build
> With patches 2095, 2119, and 2723
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>
> I upgraded a Hadoop installation to the Jan 28 nightly build.
> DFS contains 5+ M files.
> Fsck reported 1 hour after leaving safemode, 5274 under-replicated blocks with 25 single
replications, 3 hours later 433 under-replicated with still 20 single replications.
> The namenode log shows repeated timeouts of the replication monitor for the same blocks:
> 2008-02-01 03:41:24,184 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask datanode to replicate blk_2984271423661664080 to datanode(s) datanode1 datanode2
> 2008-02-01 03:51:14,104 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_2984271423661664080
> 2008-02-01 03:51:22,303 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask datanode to replicate blk_2984271423661664080 to datanode(s) datanode3 datanode4
> 2008-02-01 04:01:14,150 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_2984271423661664080
> 2008-02-01 04:01:19,344 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer:
ask datanode to replicate blk_2984271423661664080 to datanode(s) datanode5 datanode6
> ...
> The datanode seems to be successfully transmitting the blocks:
> 2008-02-01 03:42:06,284 INFO org.apache.hadoop.dfs.DataNode: datanode Starting thread
to transfer block blk_2984271423661664080 to datanode1, datannode2
> 2008-02-01 03:42:09,535 INFO org.apache.hadoop.dfs.DataNode: datanode:Transmitted block
blk_2984271423661664080 to /datanode1
> 2008-02-01 03:52:06,238 INFO org.apache.hadoop.dfs.DataNode: datanode Starting thread
to transfer block blk_2984271423661664080 to datanode3,datanode4
> 2008-02-01 03:52:09,470 INFO org.apache.hadoop.dfs.DataNode: datanode:Transmitted block
blk_2984271423661664080 to /datanode3
> The destination datanodes seem to have problems receiving these blocks (some time later
for a different attempt):
> 2008-02-01 06:43:06,541 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_2984271423661664080
from /datanode
> 2008-02-01 06:43:09,647 INFO org.apache.hadoop.dfs.DataNode: Exception in receiveBlock
for block blk_2984271423661664080 java.net.SocketException: Connection reset
> 2008-02-01 06:43:09,647 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_2984271423661664080
received exception java.net.SocketException: Connection reset
> But I was successfully transferring the block between the two datanodes using scp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message