hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandled exception
Date Fri, 27 Jul 2018 16:52:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559993#comment-16559993
] 

Anu Engineer commented on HDFS-13727:
-------------------------------------

[~xiaochen] Thanks for catching it. I have Cherrypicked to both branch-3.0 and branch-3.1,
thx

 

> Log full stack trace if DiskBalancer exits with an unhandled exception
> ----------------------------------------------------------------------
>
>                 Key: HDFS-13727
>                 URL: https://issues.apache.org/jira/browse/HDFS-13727
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: diskbalancer
>    Affects Versions: 3.0.3
>            Reporter: Stephen O'Donnell
>            Assignee: Gabor Bota
>            Priority: Minor
>             Fix For: 3.2.0, 3.0.4
>
>         Attachments: HDFS-13727.001.patch, HDFS-13727.002.patch
>
>
> In HDFS-13175 it was discovered that when a DN reports the usage on a volume to be greater
than the volume capacity, the disk balancer will fail with an unhelpful error:
> {code}
> $ hdfs diskbalancer -report -top 5
> 18/06/11 10:19:43 INFO command.Command: Processing report command
> 18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from NN: update
interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
> 18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
> 18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException
> {code}
> In HDFS-13175, a change was made to include more details in the exception name,  so after
the change the code is:
> {code}
>   public void setUsed(long dfsUsedSpace) {
>     Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
>         "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
>         dfsUsedSpace, getCapacity());
>     this.used = dfsUsedSpace;
>   }
> {code}
> There may however be other scenarios that cause the balancer to exit with an unhandled
exception, and it would be helpful if the tool logged out the full stack trace on error rather
than just the exception name.
> In DiskBalancerCLI.java, the relevant code is:
> {code}
>   public static void main(String[] argv) throws Exception {
>     DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
>     int res = 0;
>     try {
>       res = ToolRunner.run(shell, argv);
>     } catch (Exception ex) {
>       LOG.error(ex.toString());
>       res = 1;
>     }
>     System.exit(res);
>   }
> {code}
> We should change the error logged in the exception block to log out the full stack to
give more information on all unhandled errors, eg:
> {code}
> LOG.error(ex.toString(), ex);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message