hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems
Date Fri, 17 Apr 2015 21:54:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500757#comment-14500757

Arpit Agarwal commented on HDFS-8163:

While working on more tests I found some more issues with the timestamp usage. The [System.nanotime|https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#nanoTime()]
docs state that it can return a negative value and can overflow between successive invocations.
So two values should never be compared directly but diffed to handle overflow.

My guess is that negative values/overflow are unlikely on the platforms we care about but
we should be handling them correctly anyway. I plan to split out the timestamp handling logic
of BPServiceActor into a separate utility class for clarity and ease of unit testing. Will
post an updated patch later today.

> Using monotonicNow for block report scheduling causes test failures on recently restarted
> -------------------------------------------------------------------------------------------------
>                 Key: HDFS-8163
>                 URL: https://issues.apache.org/jira/browse/HDFS-8163
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.1
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>            Priority: Blocker
>         Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch
> {{BPServiceActor#blockReport}} has the following check:
> {code}
>   List<DatanodeCommand> blockReport() throws IOException {
>     // send block report if timer has expired.
>     final long startTime = monotonicNow();
>     if (startTime - lastBlockReport <= dnConf.blockReportInterval) {
>       return null;
>     }
> {code}
> Many tests trigger an immediate block report via {{BPServiceActor#triggerBlockReportForTests}}
which sets {{lastBlockReport = 0}}. However if the machine was restarted recently then startTime
may be less than {{dnConf.blockReportInterval}} and the block report is not sent.
> {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed since an
arbitrary origin. The time should be used only for comparison with other values returned by

This message was sent by Atlassian JIRA

View raw message