hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1171) Lots of fetch failures
Date Sun, 01 Nov 2009 03:32:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772300#action_12772300
] 

Christian Kunz commented on MAPREDUCE-1171:
-------------------------------------------

The algorithm in hadoop-0.20 that determines when to mark a map task as having too many fetch
failures seems to have big problems with jobs that have a few out-layers in map task runtimes.

The job mentioned in the description has a couple of map tasks requiring about 3 hours of
runtime (1hr application execution time and 2hrs merge phase), with the overall average map
runtime much smaller.
The consequence is that all reduces just wait for these few map tasks to finish, and these
maps get declared as 'failed' very quickly because of many socket timeout exceptions on the
reduce side although there is nothing wrong with the map tasks.

The first attempt to run the job failed after 28hrs (we allow for up to 10 map task failures),
with about only 200 reduces successfully fetching map output every time the long running map
tasks completed; these map tasks got declared as failed within less than 1 minute after completion.

We deployed an emergency fetch in the ReduceTask, commenting out the portion where a single
read error results in a report:
{noformat}
  
              if (//cr.getError().equals(CopyOutputErrorType.READ_ERROR) ||
                 ((noFailedFetches >= fetchRetriesPerMap)
                  && ((noFailedFetches - fetchRetriesPerMap) % 2) == 0)) {
                synchronized (ReduceTask.this) {
                  taskStatus.addFetchFailedMap(mapTaskId);
{noformat}
and it helped somewhat.
The job finished after 6 re-executions of the long running map tasks (fortunately, we allow
a high number of map task failures), but still took 18hrs instead of optimally less than 4
hrs. About 600 reduces got map output successfully before the map tasks got declared as failed,
always because of 5+ socket timeout exceptions (one advantage of hadoop-0.20 to retry a fetch
within a few seconds was counter-productive here).

With hadoop-0.18.3 this job  needed less than 9 hrs (not optimal but we could live with it).

Take away:
Current algorithm is not general enough to cover all corner cases, in particular jobs with
a handful of much longer running map tasks.
It should be improved (there should be an allowance for socket timeout exceptions when a high
number of reduces fetch output from a single map task in parallel) or there should be a configuration
option to allow to override the threshold on a per job basis.

> Lots of fetch failures
> ----------------------
>
>                 Key: MAPREDUCE-1171
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.1
>            Reporter: Christian Kunz
>
> Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map task failures
because of 'Too many fetch-failures'.
> One of our jobs makes hardly any progress, because of 3000 reduces not able to get map
output of 2 trailing maps (with about 80GB output each), which repeatedly are marked as failures
because of reduces not being able to get their map output.
> One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed mapoutput
fetch even after a single try when it was a read error (cr.getError().equals(CopyOutputErrorType.READ_ERROR).
I do not think this is a good idea, as trailing map tasks will be attacked by all reduces
simultaneously.
> Here is a log output of a reduce task:
> {noformat}
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200910281903_0028_r_000000_0
copy failed: attempt_200910281903_0028_m_002781_1 from some host
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException:
Read timed out        at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>         at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
>         at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
>         at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
>         at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
>         at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_200910281903_0028_r_000000_0:
Failed fetch #1 from attempt_200910281903_0028_m_002781_1
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to fetch map-output
from attempt_200910281903_0028_m_002781_1 even after MAX_FETCH_RETRIES_PER_MAP retries...
 or it is a read error,  reporting to the JobTracker.
> {noformat}
> Also I saw a few log messages which look suspicious as if successfully fetched map output
is discarded because of the map being marked as failed (because of too many fetch failures).
This would make the situation even worse.
> {noformat}
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_200910281903_0028_m_001076_0,
compressed len: 21882555, decompressed len: 23967845
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 23967845
bytes (21882555 raw bytes) into RAM from attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 23967845 bytes
from map-output for attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200910281903_0028_m_001076_0
-> (20, 39772) from some host
> ...
> 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output
of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message