hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6360) MiniDFSCluster can cause unexpected side effects due to sharing of config
Date Sat, 10 May 2014 22:11:30 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992915#comment-13992915
] 

Kihwal Lee commented on HDFS-6360:
----------------------------------

Here is an example from HDFS-5522's precommit build, in which TestNameNodeRespectsBindHostKeys#testServiceRpcBindHostKey
failed.

The test case starts up a MiniDFSCluster and does a check. Then it is shutdown and another
one is started with a modified config variable and the test case does another check.

During the first NN is startup, the NN internally sets {{fs.defaultFs}}, {{dfs.namenode.servicerpc-address}},
{{dfs.namenode.rpc-address}}, {{dfs.namenode.http-address}}, {{dfs.namenode.https-address}}
with the port it actually bound to.  MiniDFSCluster#createNameNode() sets this again after
NN startup mainly for HA/federation with nsid and nnid.

The next time MiniDFSCluster is started or just NN is restarted, the config will contain target
binding addresses with a non-zero port number.  Certain configs like {{fs.defaultFs}} is reset
by MiniDFSCluster in certain cases (not all the time!), but {{dfs.namenode.servicerpc-address}}
is left with the real port.  During restart the NN tries to bind to the specific port for
the service RPC server.

In the failed test case, port 48275 was initially used by the service RPC server. Next time,
http server was started with 127.0.0.1:0 and happened to use the recently freed 48275.  Because
this port is already taken, the service RPC server failed to start and MiniDFSCluster startup
failed.



> MiniDFSCluster can cause unexpected side effects due to sharing of config
> -------------------------------------------------------------------------
>
>                 Key: HDFS-6360
>                 URL: https://issues.apache.org/jira/browse/HDFS-6360
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>
> As noted in HDFS-6329 and HDFS-5522, certain use cases of MiniDFSCluster can result in
unexpected results and falsely failing or passing unit tests.
> Since a {{Configuration}} object is shared for all namenode startups, the modified conf
object during a NN startup is passed to the next NN startup.  The effect of the modified conf
propagation and subsequent modifications is different depending on whether it is a single
NN cluster, HA cluster or federation cluster.
> It also depends on what test cases are doing with the config. For example, MiniDFSCluster#getConfiguration(int)
returns the saved conf for the specified NN, but that is not actually the conf object used
by the NN. It just contained the same content one time in the past and it is not guaranteed
to be that way.
> Restarting the same NN can also cause unexpected results. The new NN will switch to the
conf that was cloned & saved AFTER the last startup.  The new NN will start with a changed
config intentionally or unintentionally.  The config variables such as {{fs.defaultFs}}, {{dfs.namenode.rpc-address}}
will be implicitly set differently than the initial condition.  Some test cases rely on this
and others occasionally break because of this.
> In summary,
> * MiniDFSCluster does not properly isolate configs.
> * Many test cases happen to work most of times. Correcting MiniDFSCluster causes mass
breakages of test cases and requires fixing them.
> * Many test cases rely on broken behavior and might pass when they should have actually
failed.
> We need to
> * Make MiniDFSCluster behave in a consistent way
> * Provide proper methods and documentation for the correct usage of MiniDFSCluster
> * Fix the unit tests that will be broken after improving MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message