hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Xiaoqiao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved
Date Sun, 28 Oct 2018 13:56:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666405#comment-16666405
] 

He Xiaoqiao commented on HADOOP-15864:
--------------------------------------

Thanks [~ayushtkn] feedback, I recheck the fail UT (#TestWebHdfsFileSystemContract) and retest
at local machine, it is related to the issue.
The main reason:
Before patch, {{WebHdfsHandler}} at DataNode will meet {{IllegalArgumentException}} at {{SecurityUtil#buildTokenService}}
when create DFSClient instance using {{newDfsClient(nnId, confForCreate);}} when handle event
{{onCreate}} but client do not pass parameter #namenoderpcaddress, so Client will meet HTTP
status code 400.
After patch, DataNode will *NOT* meet {{IllegalArgumentException}} when create DFSClient instance,
however, when DataNode creates wrapped outputstream, it will meet auth exception since no
token exist, so Client will meet HTTP status code 403.
In one word, this patch changes http semantic when no parameter about #namenoderpcaddress.
I have created another ticket HADOOP-15883 to trace this issue.
Thanks [~ayushtkn] again and sorry I do not see it in time when check fail ut in my local
environment.

> Job submitter / executor fail when SBN domain name can not resolved
> -------------------------------------------------------------------
>
>                 Key: HADOOP-15864
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15864
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Critical
>             Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
>         Attachments: HADOOP-15864-branch.2.7.001.patch, HADOOP-15864-branch.2.7.002.patch,
HADOOP-15864.003.patch, HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name can not
resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} instance which
invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode with Security. Since in HDFS
HA mode UGI need include separate token for each NameNode in order to dealing with Active-Standby
switch, the double tokens' content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} it checks
whether the address of NameNode has been resolved or not, if Not, throw #IllegalArgumentException
upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets resolve completely.
> Another questions many guys consider is why NameNode domain name can not resolve? I think
there are many scenarios, for instance node replace when meet fault, and refresh DNS sometimes.
Anyway, Standby NameNode failure should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
>     String host = null;
>     if (useIpForTokenService) {
>       if (addr.isUnresolved()) { // host has no ip address
>         throw new IllegalArgumentException(
>             new UnknownHostException(addr.getHostName())
>         );
>       }
>       host = addr.getAddress().getHostAddress();
>     } else {
>       host = StringUtils.toLowerCase(addr.getHostName());
>     }
>     return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:691)
> at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.<init>(ChRootedFileSystem.java:106)
> at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.<init>(InodeTree.java:377)
> at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.<init>(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
> ... 35 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498)
> ... 58 more
> Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: standbynamenode
> at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:390)
> at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:369)
> at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:317)
> at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:132)
> at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:84)
> ... 62 more
> Caused by: java.net.UnknownHostException: standbynamenode
> ... 67 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message