hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2447) Distcp with hdfs:// passed with error in JT log while copying from .20.204 to .20.205 ( with useIp=false)
Date Wed, 19 Oct 2011 18:15:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130845#comment-13130845

Daryn Sharp commented on HDFS-2447:

The problem was found to be that the JT couldn't contact the remote NN to renew a token due
to a firewall.  The tasks on the DNs were however able to contact the remote NN so the job
succeeded.  However, the job would have failed if it executed past the token expiration since
the JT was unable to renew the token.

If the JT has to acquire tokens for a job, and acquisition fails, the job will fail.  This
is the ideal behavior, but there's a loophole...  If the JT finds the token in the job's token
cache, then it "assumes" the token must valid.  The reality may be that the token is invalid,
canceled, long expired, or the NN can't even be reached.  In all of these cases, the tasks
get fired off anyway, just to clog up a cluster while they die a long slow death.  Actually,
on 23, it's been observed that tasks using an invalid token will pound on the NN every second
-- on one cluster this happened for a month!

The JT immediately issues a token renewal and then uses a timer for future renewals.  However,
all renewals are done in a thread which means if the initial renewal fails because the token
is bad, the job starts anyway.  The simple solution is for the first renewal to occur in the
job's context so an exception will kill the job, and future renewals to remain thread-based.
> Distcp with hdfs:// passed with error in JT log while copying from .20.204  to .20.205
( with useIp=false)
> ----------------------------------------------------------------------------------------------------------
>                 Key: HDFS-2447
>                 URL: https://issues.apache.org/jira/browse/HDFS-2447
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: security
>    Affects Versions:
>            Reporter: Rajit Saha
>            Assignee: Daryn Sharp
> I tried to copy file from .20.204 to .20.205 by distcp over hdfs:// while using hadoop.security.token.service.use_ip=false
in core-site.xml. The copy was successful but found error " org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal:"
exception in .20.205 JT.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message