Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <8810053.1177976775487.JavaMail.jira@brutus>
Date: Mon, 30 Apr 2007 16:46:15 -0700 (PDT)
From: "Michael Bieniosek (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-1309) DFS logging in
 NameSystem.pendingTransfer consumes all disk space
In-Reply-To: <30277417.1177967475313.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492817 ] 

Michael Bieniosek commented on HADOOP-1309:
-------------------------------------------

Here's another one from trying to add a new node to my cluster:

2007-04-30 23:10:18,040 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from x.y.z.237:50010
2007-04-30 23:10:18,040 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/x.y.z.237:50010
2007-04-30 23:10:18,040 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from x.y.z.237:50010
2007-04-30 23:10:18,040 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/x.y.z.237:50010
2007-04-30 23:10:18,040 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from x.y.z.237:50010

> DFS logging in NameSystem.pendingTransfer consumes all disk space
> -----------------------------------------------------------------
>
>                 Key: HADOOP-1309
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1309
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Michael Bieniosek
>
> Sometimes the namenode goes crazy.  I see this in my logs:
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate blk_-9064654741761822118 to datanode(s) x.y.z.247:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate blk_-8996500637974689840 to datanode(s) x.y.yz.225:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate blk_-8870980160272831217 to datanode(s) x.y.z.244:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate blk_-8721101562083234290 to datanode(s) x.y.z.250:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.250:50010 to replicate blk_-9044741671491162229 to datanode(s) x.y.z.244:50010
> There are on the order of 10k/sec until the machine runs out of disk space.
> I notice that in FSNamesystem.java, about 10 lines above this line is logged, there is a comment:
>         //
>         // Move the block-replication into a "pending" state.
>         // The reason we use 'pending' is so we can retry
>         // replications that fail after an appropriate amount of time.
>         // (REMIND - mjc - this timer is not yet implemented.)
>         //

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.