hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-1764) Inconsistancy between Mapper/Reducer book keeping
Date Thu, 30 Aug 2007 12:36:54 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy resolved HADOOP-1764.
-----------------------------------

    Resolution: Invalid

Srikanth - I'm closing this one, please feel free to open another issue if you see fit. Thanks!

> Inconsistancy between Mapper/Reducer book keeping
> -------------------------------------------------
>
>                 Key: HADOOP-1764
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1764
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>         Environment: Related: HADOOP-1763 (Same environment)
> Version: 0.15.0-dev, r565628
> Compiled: Tue Aug 14 20:55:37 UTC 2007 by hadoopqa
> 1400 Nodes
>            Reporter: Srikanth Kakani
>            Assignee: Arun C Murthy
>            Priority: Blocker
>
> Refer to HADOOP-1763
> This occurs in that scenario once many job trackers are lost, reducers do not know where
the map outputs are present. They keep retrying the wrong node causing the reducers to run
forever without failures.
> Relevant logs:
> Reducer output:
> 2007-08-21 09:47:47,046 INFO org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
Copying task_200708210155_0003_m_002598_0 output from node50
> 2007-08-21 09:47:53,643 WARN org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
copy failed: task_200708210155_0003_m_002598_0 from node50
> 2007-08-21 09:47:53,643 WARN org.apache.hadoop.mapred.ReduceTask: java.io.FileNotFoundException:
http://wm511750.inktomisearch.com:50060/mapOutput?map=task_200708210155_0003_m_002598_0&reduce=6
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1243)
> 	at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:207)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:673)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:631)
> 2007-08-21 09:53:02,327 INFO org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
Copying task_200708210155_0003_m_002598_0 output from node50
> 2007-08-21 09:53:02,333 WARN org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
copy failed: task_200708210155_0003_m_002598_0 from node50
> 2007-08-21 09:53:02,333 WARN org.apache.hadoop.mapred.ReduceTask: java.io.FileNotFoundException:
http://node50:50060/mapOutput?map=task_200708210155_0003_m_002598_0&reduce=6
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1243)
> 	at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:207)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:673)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:631)
> 2007-08-21 09:57:33,899 INFO org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
Copying task_200708210155_0003_m_002598_0 output from node50.inktomisearch.com.
> 2007-08-21 09:57:33,908 WARN org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
copy failed: task_200708210155_0003_m_002598_0 from node50.inktomisearch.com
> 2007-08-21 09:57:33,908 WARN org.apache.hadoop.mapred.ReduceTask: java.io.FileNotFoundException:
http://node50:50060/mapOutput?map=task_200708210155_0003_m_002598_0&reduce=6
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1243)
> 	at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:207)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:673)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:631)
> 2007-08-21 10:00:56,337 INFO org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
Copying task_200708210155_0003_m_002598_1 output from node75.inktomisearch.com.
> 2007-08-21 10:00:56,342 INFO org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
done copying task_200708210155_0003_m_002598_1 output from node75
> 2007-08-21 10:02:17,486 INFO org.apache.hadoop.mapred.ReduceTask: task_200708210155_0003_r_000006_2
Ignoring obsolete copy result for Map Task: task_200708210155_0003_m_002598_0 from host: node50
> Looking at TIP task_200708210155_0003_m_002598:
> task_200708210155_0003_m_002598_0	node50	KILLED	0.00%		21-Aug-2007 09:38:49 	Lost task
tracker
> task_200708210155_0003_m_002598_1	node75	KILLED	0.00%		21-Aug-2007 11:22:42 	Lost task
tracker
> task_200708210155_0003_m_002598_2	node55	SUCCEEDED	100.00%	21-Aug-2007 11:22:46	21-Aug-2007
11:27:19 (4mins, 33sec) 	
> task_200708210155_0003_m_002598_3	node49	KILLED	100.00%	21-Aug-2007 11:22:48	21-Aug-2007
11:27:48 (4mins, 59sec) 	Already completed TIP
> Notes:
> 1. Even finally the reducer seems to fetch data from the incorrect TaskTracker, it is
not checking with the job tracker for the final/correct map output
> 2. It seems to retry more times and sleeps for longer time (looking at the interval of
log messages)
> 3. An obvious solution may be to go to the job tracker and directly get the correct map
output (I was able to get the correct map output from node55 using http, without any errors)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message