distributedlog-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DL-145) Fix the flaky testServiceTimeout
Date Wed, 28 Dec 2016 07:33:59 GMT

    [ https://issues.apache.org/jira/browse/DL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782289#comment-15782289

Hudson commented on DL-145:

UNSTABLE: Integrated in Jenkins build distributedlog-nightly-build #170 (See [https://builds.apache.org/job/distributedlog-nightly-build/170/])
DL-145: the write requests should be error out immediately even if the (sijieg: rev a4999a890173562593313c7ec2d8989113694415)
* (edit) distributedlog-core/src/main/java/com/twitter/distributedlog/BKAsyncLogWriter.java

> Fix the flaky testServiceTimeout
> --------------------------------
>                 Key: DL-145
>                 URL: https://issues.apache.org/jira/browse/DL-145
>             Project: DistributedLog
>          Issue Type: Test
>          Components: distributedlog-service
>    Affects Versions: 0.4.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
> The TestDistributedLogService#testServiceTimeout case is not stable, e.g. https://builds.apache.org/job/distributedlog-precommit-pullrequest/22/com.twitter$distributedlog-service/testReport/com.twitter.distributedlog.service/TestDistributedLogService/testServiceTimeout/
> It could be reproduced on my box occasionally, and the failures were stable if i tuned
the ServiceTimeoutMs from 200 to 150, and always passed if tuned to a larger value, e.g. 1000(btw,
my disk is SSD type)
> After digging into it, shows it related with starting a new log segment corner case.
> For a good case, once service time out occurs, steam status : ERROR -> CLOSING ->
CLOSED, calling Abortables.asyncAbort will trigger the cached logsegment be aborted, then
writeOp will be injected an exception, e.g. write cancel exception.
> For a bad case, since no log records be written before, so there'll be an async start
new log segment, once the timeout occurs, the segment starting still not be done, so no cache,
then asyncAbort has no change to abort that segment.
> I think change the test timeout value to a larger one should be fine for this special
test corner case.
> will attach a minor patch later.  Any suggestions are welcome.

This message was sent by Atlassian JIRA

View raw message