impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IMPALA-4631) plan-fragment-executor.cc:518] Check failed: other_time <= total_time (25999394 vs. 25999393)
Date Wed, 15 Mar 2017 19:54:41 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dan Hecht updated IMPALA-4631:
------------------------------
    Target Version: Product Backlog  (was: Impala 2.9.0)
          Priority: Major  (was: Blocker)

This comment should avoid the dcheck, but leaving this open since we don't yet understand
the root cause.

{code}
commit de12d86f208ceaf75b5d45aca229002002e8f860
Author: Dan Hecht <dhecht@cloudera.com>
Date:   Mon Mar 13 18:24:25 2017 -0700

    IMPALA-4631: avoid DCHECK in PlanFragementExecutor::Close().

    Occasionally, we see other_time == total_time+1 for some reason
    we don't yet understand. We've only seen this on EC2 and with
    CLOCK_MONOTONIC_COARSE, so it could be that clock occasionally
    goes backwards. The intent of the DCHECK is to verify that
    we didn't miss accounting entire intervals of time, so let's
    loosen it slightly to avoid this "false" positive.

    Change-Id: Ia9883fdb1be6a4301864da85da56ec96f4dafbe7
    Reviewed-on: http://gerrit.cloudera.org:8080/6375
    Reviewed-by: Dan Hecht <dhecht@cloudera.com>
    Reviewed-by: Michael Ho <kwho@cloudera.com>
    Tested-by: Impala Public Jenkins
{code}

> plan-fragment-executor.cc:518] Check failed: other_time <= total_time (25999394 vs.
25999393)
> ---------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-4631
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4631
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Dan Hecht
>            Assignee: Dan Hecht
>              Labels: broken-build, flaky
>
> This dcheck occasionally fires:
> {code}
> impalad.FATAL:F1201 22:35:58.617157 30293 plan-fragment-executor.cc:518] Check failed:
other_time <= total_time (25999394 vs. 25999393)
> {code}
> I suspect the problem is with using floating point operations in places like this:
> {code}
>    timespec ts;
>     clock_gettime(OsInfo::fast_clock(), &ts);
>     return ts.tv_sec * 1e9 + ts.tv_nsec;
> {code}
> and because floating point doesn't distribute, and we can end up with {noformat} c *
(a + b) < c * a + c * b {noformat} which is effectively what the dcheck does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message