hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyin Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-7726) Parse and check the configuration settings of edit log to prevent runtime errors
Date Mon, 02 Feb 2015 22:38:34 GMT
Tianyin Xu created HDFS-7726:
--------------------------------

             Summary: Parse and check the configuration settings of edit log to prevent runtime
errors
                 Key: HDFS-7726
                 URL: https://issues.apache.org/jira/browse/HDFS-7726
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.6.0
            Reporter: Tianyin Xu
            Priority: Minor


============================
Problem
-------------------------------------------------

Similar as the following two issues addressed in 2.7.0,
https://issues.apache.org/jira/browse/YARN-2165
https://issues.apache.org/jira/browse/YARN-2166

The edit log related configuration settings should be checked in the constructor rather than
being applied directly at runtime. This would cause runtime failures if the values are wrong.

Take "dfs.ha.tail-edits.period" as an example, currently in EditLogTailer.java, its value
is not checked but directly used in doWork(), as the following code snippets. Any negative
values would cause IllegalArgumentException (which is not caught) and impair the component.


{code:title=EditLogTailer.java|borderStyle=solid}
private void doWork() {
{
    .....
    Thread.sleep(sleepTimeMs);
    ....
}
{code}

Another example is "dfs.ha.log-roll.rpc.timeout". Right now, we use getInt() to parse the
value at runtime in the getActiveNodeProxy() function which is called by doWork(), shown as
below. Any erroneous settings (e.g., ill-formatted integer) would cause exceptions.

{code:title=EditLogTailer.java|borderStyle=solid}
private NamenodeProtocol getActiveNodeProxy() throws IOException {
{
    .....
    int rpcTimeout = conf.getInt(
          DFSConfigKeys.DFS_HA_LOGROLL_RPC_TIMEOUT_KEY,
          DFSConfigKeys.DFS_HA_LOGROLL_RPC_TIMEOUT_DEFAULT);
    ....
}
{code}

============================
Solution (the attached patch)
-------------------------------------------------

Basically, the idea of the attached patch is to move the parsing and checking logics into
the constructor to expose the error at initialization, so that the errors won't be latent
at the runtime (same as YARN-2165 and YARN-2166)

I'm not aware of the implementation of 2.7.0. It seems there's checking utilities such as
the validatePositiveNonZero function in YARN-2165. If so, we can use that one to make the
checking more systematic.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message