hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)
Date Wed, 23 Nov 2016 03:18:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688740#comment-15688740
] 

Rui Li commented on HIVE-15168:
-------------------------------

[~zsombor.klara], thanks for the investigation. I also tried adding some sleep in the listener
before changing the state to QUEUED, then the test fails consistently.
Based on that, I think we have two choices. One is to remove {{verify(listener).onJobQueued(handle)}}
in the test. Because it's not guaranteed to be called. Seems we can keep {{verify(listener).onJobStarted(handle)}}
- at least on the RemoteDriver side we're sending JobStarted and JobResult sequentially.
The other one is try to detect the missing state changes. E.g. if the current state is SENT
and we're told to change to SUCCEEDED, then we must have missed QUEUED and STARTED. And we
can notify the listeners of the missing state changes before we change to SUCCEEDED.
[~xuefuz] what's your opinion on this?

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> -----------------------------------------------------------
>
>                 Key: HIVE-15168
>                 URL: https://issues.apache.org/jira/browse/HIVE-15168
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already addressed one source
of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to the other
as
>       // needed.
>       rpc.addListener(new GenericFutureListener<io.netty.util.concurrent.Future<Void>>()
{
>         @Override
>         public void operationComplete(io.netty.util.concurrent.Future<Void> f)
{
>           if (f.isSuccess()) {
>             handle.changeState(JobHandle.State.QUEUED);
>           } else if (!promise.isDone()) {
>             promise.setFailure(f.cause());
>           }
>         }
>       });
>       promise.addListener(new GenericFutureListener<Promise<T>>() {
>         @Override
>         public void operationComplete(Promise<T> p) {
>           if (jobId != null) {
>             jobs.remove(jobId);
>           }
>           if (p.isCancelled() && !rpc.isDone()) {
>             rpc.cancel(true);
>           }
>         }
>       });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message