hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3132) DFS writes stuck occationally
Date Thu, 17 Apr 2008 00:11:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589779#action_12589779
] 

rangadi edited comment on HADOOP-3132 at 4/16/08 5:10 PM:
---------------------------------------------------------------

Making this a non-blocker and moving it to 0.18 because :
# It is not fatal. DFS writes, tasks recover from it.
# happens very very rarely. Till now we know only one cluster where this happens.
# mostly looks like a bug outside Hadoop and JRE (so may not be present on different kernel
versions, hardware or os, switches).

Why delay in diagnosis :
hard to reproduce and requires a specific 500 node cluster.

      was (Author: rangadi):
    Making this a non-blocker and moving it to 0.18 because :

#It is not fatal. DFS writes, tasks recover from it.
# happens very very rarely. Till now we know only one cluster where this happens.
# mostly looks like a bug outside Hadoop and JRE (so may not be present on different kernel
versions, hardware or os, switches).

Why delay in diagnosis :
hard to reproduce and requires a specific 500 node cluster.
  
> DFS writes stuck occationally
> -----------------------------
>
>                 Key: HADOOP-3132
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3132
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Runping Qi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>
> This problem happens in 0.17 trunk
> As reported in hadoop-3124,
> I saw reducers waited 10 minutes for writing data to dfs and got timeout.
> The client retries again and timeouted after another 19 minutes.
> During the period of write stuck, all the nodes in the data node pipeline were functioning
fine.
> The system load was normal.
> I don't believe this was due to slow network cards/disk drives or overloaded machines.
> I believe this and hadoop-3033 are related somehow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message