distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DL-145) Fix the flaky testServiceTimeout
Date Fri, 16 Dec 2016 11:30:58 GMT
Liang Xie created DL-145:

             Summary: Fix the flaky testServiceTimeout
                 Key: DL-145
                 URL: https://issues.apache.org/jira/browse/DL-145
             Project: DistributedLog
          Issue Type: Test
          Components: distributedlog-service
    Affects Versions: 0.4.0
            Reporter: Liang Xie
            Assignee: Liang Xie

The TestDistributedLogService#testServiceTimeout case is not stable, e.g. https://builds.apache.org/job/distributedlog-precommit-pullrequest/22/com.twitter$distributedlog-service/testReport/com.twitter.distributedlog.service/TestDistributedLogService/testServiceTimeout/

It could be reproduced on my box occasionally, and the failures were stable if i tuned the
ServiceTimeoutMs from 200 to 150, and always passed if tuned to a larger value, e.g. 1000(btw,
my disk is SSD tyle)

After digging into it, shows it related with starting a new log segment corner case.
For a good case, once service time out occurs, steam status : ERROR -> CLOSING -> CLOSED,
calling Abortables.asyncAbort will trigger the cached logsegment be aborted, then writeOp
will be injected an exception, e.g. write cancel exception.
For a bad case, since no log records be written before, so there'll be an async start new
log segment, once the timeout occurs, the segment starting still not be done, so no cache,
then asyncAbort has no change to abort that segment.

I think change the test timeout value to a larger one should be find for this special test
corner case.

will attache a minor patch later.  Any suggestions are welcome.

This message was sent by Atlassian JIRA

View raw message