hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
Date Thu, 07 Mar 2013 19:57:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596289#comment-13596289
] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5042:
----------------------------------------------------

In my prelim security work, I once had the JobClient generate the secret and then later had
the MR AM generate the tokens and reupload the tokens file into the submit directory. That
was another hop to DFS and we changed that since, but this recovery code bug fell through.
So there are multiple solutions:
 - Have a single secret but let the client generate it
 - Have a single secret but upload the tokens file for future app-attempts
 - Have multiple tokens

It's future proof to separate the task and shuffle security secrets, but not sure that is
tied in directly to this one if we consider the reupload solution.

I don't feel strongly about any solution, but one thing we should keep in mind is to move
as much stuff into the AM so that the client is thinner and enables us to do submits via web
services.
                
> Reducer unable to fetch for a map task that was recovered
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-5042
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, security
>    Affects Versions: 0.23.7, 2.0.5-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch
>
>
> If an application attempt fails and is relaunched the AM will try to recover previously
completed tasks.  If a reducer needs to fetch the output of a map task attempt that was recovered
then it will fail with a 401 error like this:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 401 for URL: http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_000016_0
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)
> 	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231)
> 	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156)
> {noformat}
> Looking at the corresponding NM's logs, we see the shuffle failed due to "Verification
of the hashReply failed".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message