hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1332) Sporadic unit test failures (TestMiniMRClasspath, TestMiniMRLocalFS, TestMiniMRDFSCaching)
Date Wed, 30 May 2007 17:11:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500198
] 

Raghu Angadi commented on HADOOP-1332:
--------------------------------------


But it is a blocking read and we are not reading just what is available. So it only makes
the problem less severe but not fix it. Does read() or interrupt explicitly state that it
does not get interrupted?


> Sporadic unit test failures (TestMiniMRClasspath, TestMiniMRLocalFS, TestMiniMRDFSCaching)
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1332
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: copy-thread.patch, HADOOP-1332_1_20070529.patch, HADOOP-1332_2_20070530.patch
>
>
> Since April 22 I've been seeing sporadic failures of these tests on Windows:
>  - TestMiniMRClasspath
>  - TestMiniMRLocalFS
>  - TestMiniMRDFSCaching
> The change log since then can be viewed starting at build 66:
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/66/
> The tests fail because they timeout.  
> They timeout because one of the task trackers doesn't go idle.
> One of the task trackers doesn't go idle because a map fails and has to be killed.
> Reordered and annotated tests logs (my comments are in parenthesis):
>     (map 0 executes and logs a 'done' and two 'completed' messages):
>     [junit] 2007-05-03 21:19:40,516 INFO  mapred.JobInProgress (JobInProgress.java:findNewTask(653))
- Choosing cached task tip_0001_m_000000
>     [junit] 2007-05-03 21:19:40,516 INFO  mapred.JobTracker (JobTracker.java:createTaskEntry(758))
- Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker_task000.com:2893'
>     [junit] 2007-05-03 21:19:40,516 INFO  mapred.TaskTracker (TaskTracker.java:startNewTask(1071))
- LaunchTaskAction: task_0001_m_000000_0
>     [junit] 2007-05-03 21:19:45,655 INFO  mapred.TaskTracker (TaskTracker.java:reportProgress(1284))
- task_0001_m_000000_0 1.0% hdfs://localhost:2882/testing/ext/input/part-0:0+10
>     [junit] 2007-05-03 21:19:46,201 INFO  mapred.TaskTracker (TaskTracker.java:reportProgress(1284))
- task_0001_m_000000_0 1.0% hdfs://localhost:2882/testing/ext/input/part-0:0+10
>     [junit] 2007-05-03 21:19:46,201 INFO  mapred.TaskTracker (TaskTracker.java:reportDone(1334))
- Task task_0001_m_000000_0 is done.
>     [junit] 2007-05-03 21:19:46,357 INFO  mapred.JobInProgress (JobInProgress.java:completedTask(734))
- Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
>     [junit] 2007-05-03 21:19:46,357 INFO  mapred.TaskInProgress (TaskInProgress.java:completedTask(475))
- Task 'task_0001_m_000000_0' has completed.
>     (map 2 executes and logs a 'done' message but no 'completed' messages. It is eventually
killed):
>     [junit] 2007-05-03 21:19:40,594 INFO  mapred.JobInProgress (JobInProgress.java:findNewTask(653))
- Choosing cached task tip_0001_m_000002
>     [junit] 2007-05-03 21:19:40,594 INFO  mapred.JobTracker (JobTracker.java:createTaskEntry(758))
- Adding task 'task_0001_m_000002_0' to tip tip_0001_m_000002, for tracker 'tracker_task002.com:2902'
>     [junit] 2007-05-03 21:19:40,594 INFO  mapred.TaskTracker (TaskTracker.java:startNewTask(1071))
- LaunchTaskAction: task_0001_m_000002_0
>     [junit] 2007-05-03 21:19:46,295 INFO  mapred.TaskTracker (TaskTracker.java:reportProgress(1284))
- task_0001_m_000002_0 0.0% hdfs://localhost:2882/testing/ext/input/part-0:20+10
>     [junit] 2007-05-03 21:19:46,310 INFO  mapred.TaskTracker (TaskTracker.java:reportDone(1334))
- Task task_0001_m_000002_0 is done.
>     ...
>     [junit] 2007-05-03 21:29:52,957 INFO  mapred.TaskTracker (TaskTracker.java:markUnresponsiveTasks(909))
- task_0001_m_000002_0: Task failed to report status for 606 seconds. Killing.
>     (long thread dump)
>     (shutting down MiniMRCluster)
>     [junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
>     [junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
>     [junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
>     [junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
>     [junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
>     [junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
>     ... (repeated until the test times out)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message