hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5394) JobTracker might schedule 2 attempts of the same task with the same attempt id across restarts
Date Fri, 03 Apr 2009 11:38:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695342#action_12695342
] 

Amar Kamat commented on HADOOP-5394:
------------------------------------

TestSocketFactory tests if the clients can connect to the server using socket factory. It
does it in the following fashion 
# Define a socket factory that uses (_port_ - 10) instead of _port_.
# Start the server
# Configure a client conf to use this socket factory implementation and server url as _hostname:port+10_
# At the client, the socket factory does a (-10) and thus is able to connect to the server.

This doesnt work with the current patch because the JobTracker tries to create a file on the
DataNode using the socket factory but the DataNode info passed to the JobTracker is correct
(i.e no +10 is done). And DataNode information cant be changed as it is obtained from the
NameNode. Hence this patch starts the JobTracker with the correct conf and not the modified
conf. JobTracker to NameNode connection need not be checked as DFSClient to NameNode connection
is checked and for the NameNode, the JobTracker is a client. 

> JobTracker might schedule 2 attempts of the same task with the same attempt id across
restarts
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5394
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5394
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>            Priority: Critical
>         Attachments: HADOOP-5394-v1.10.patch, HADOOP-5394-v1.2.patch, HADOOP-5394-v1.5.patch,
HADOOP-5394-v1.9.1.patch
>
>
> This can happen when the jobtracker gets restarted more than once. In such cases, the
jobtracker depends on the jobhistory file for the next restart count. If the new restart-count
is not flushed to the file then there is a fair chance that upon next restart, the jobtracker
might schedule a new attempt with an existing id. This can cause problems not only with the
side-effect files but also can cause the jobtracker to be in an inconsistent state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message