hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
Date Fri, 16 Nov 2018 17:59:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689764#comment-16689764
] 

Erik Krogen edited comment on HDFS-14017 at 11/16/18 5:58 PM:
--------------------------------------------------------------

Hm.. Something is pretty wrong with Jenkins. It's not actually running any of the tests, failing
with errors like:
{code}
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash
or System.exit called?
[ERROR] Command was /bin/sh -c cd /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client
&& /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-jar /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire/surefirebooter375579229167329239.jar
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire 2018-11-16T17-42-57_928-jvmRun1
surefire586051240617267363tmp surefire_07291752059438872666tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
{code}
It looks like the last patch where it actually ran tests was v009. We were seeing the same
issue on HDFS-14035, but I don't see it on other Jenkins runs against trunk (rather than the
HDFS-12943 branch).

[~vagarychen], I see you didn't merge in trunk after committing HDFS-14035, I'm going to do
so now and then re-run Jenkins and see if things get better.


was (Author: xkrogen):
Hm.. Something is pretty wrong with Jenkins. It's not actually running any of the tests, failing
with errors like:
{code}
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash
or System.exit called?
[ERROR] Command was /bin/sh -c cd /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client
&& /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-jar /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire/surefirebooter375579229167329239.jar
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire 2018-11-16T17-42-57_928-jvmRun1
surefire586051240617267363tmp surefire_07291752059438872666tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
{code}
It looks like the last patch where it actually ran tests was v009. We were seeing the same
issue on HDFS-14035, but I don't see it on other Jenkins runs against trunk (rather than the
HDFS-12943 branch).

[~chliang], I see you didn't merge in trunk after committing HDFS-14035, I'm going to do so
now and then re-run Jenkins and see if things get better.

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -------------------------------------------------------------------------
>
>                 Key: HDFS-14017
>                 URL: https://issues.apache.org/jira/browse/HDFS-14017
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: HDFS-14017-HDFS-12943.001.patch, HDFS-14017-HDFS-12943.002.patch,
HDFS-14017-HDFS-12943.003.patch, HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch,
HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, HDFS-14017-HDFS-12943.009.patch,
HDFS-14017-HDFS-12943.010.patch, HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch,
HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends {{ObserverReadProxyProvider}},
and the only difference is changing the proxy factory to use {{IPFailoverProxyProvider}}.
However this is not enough because when calling constructor of {{ObserverReadProxyProvider}}
in super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
>         HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured failover. But in
the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in current IPFailover
setup, say we have a virtual host nn.xyz.com, which points to either of two physical nodes
nn1.xyz.com or nn2.xyz.com. In current HDFS, there is always only one DT being exchanged,
which has hostname nn.xyz.com. Server only issues this DT, and client only knows the host
nn.xyz.com, so all is good. But in Observer read, even with IPFailover, the client will no
longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and nn2.xyz.com.
During this process, current code will look for DT associated with hostname nn1.xyz.com or nn2.xyz.com,
which is different from the DT given by NN. causing Token authentication to fail. This happens
in {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy provider will need
to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message