tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Bodor (Jira) <j...@apache.org>
Subject [jira] [Comment Edited] (TEZ-4119) TestSpeculation is flaky
Date Mon, 27 Jan 2020 15:52:00 GMT

    [ https://issues.apache.org/jira/browse/TEZ-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024431#comment-17024431
] 

László Bodor edited comment on TEZ-4119 at 1/27/20 3:51 PM:
------------------------------------------------------------

sometimes it can be reproduced locally, by running all tez-dag tests (normally, TestSpeculation
should finish in 3-5 seconds):
from tez-dag dir:
{code}
mvn --batch-mode  clean test -fae
{code}

there is a suspicious no-op for almost 2 minutes:
{code}
2020-01-27 16:33:22,162 INFO  [Dispatcher thread {Central}] app.DAGAppMaster (DAGAppMaster.java:handle(872))
- Completed cleanup for DAG: name=test, with id=dag_1580139200870_0001_1
2020-01-27 16:35:00,864 INFO  [Thread-53] client.TezClient (TezClient.java:<init>(210))
- Tez Client Version: [ component=tez-api, version=0.10.1-SNAPSHOT, revision=f049d93a69bdc5bd736301bf7081fa5cef2694bf,
SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2020-01-27T09:27:08Z
]
{code}

jstacks of the surefire process:  [^jstack.log] ...later:  [^jstack4.log] ...later:  [^jstack6.log]


something strange:
testBasicSpeculationPerVertexConf contains a 200ms sleep, but it tooks much longer
{code}
"Thread-3" #16 prio=5 os_prio=31 tid=0x00007fe07a3bc800 nid=0xa403 waiting on condition [0x00007000062d8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.tez.dag.app.TestSpeculation.testBasicSpeculationPerVertexConf(TestSpeculation.java:261)
{code}
10 seconds later:
{code}
"Thread-3" #16 prio=5 os_prio=31 tid=0x00007fe07a3bc800 nid=0xa403 waiting on condition [0x00007000062d8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.tez.dag.app.TestSpeculation.testBasicSpeculationPerVertexConf(TestSpeculation.java:261)
{code}


was (Author: abstractdog):
sometimes it can be reproduced locally, by running all tez-dag tests (normally, TestSpeculation
should finish in 3-5 seconds):
from tez-dag dir:
{code}
mvn --batch-mode  clean test -fae
{code}

there is a suspicious no-op for almost 2 minutes:
{code}
2020-01-27 16:33:22,162 INFO  [Dispatcher thread {Central}] app.DAGAppMaster (DAGAppMaster.java:handle(872))
- Completed cleanup for DAG: name=test, with id=dag_1580139200870_0001_1
2020-01-27 16:35:00,864 INFO  [Thread-53] client.TezClient (TezClient.java:<init>(210))
- Tez Client Version: [ component=tez-api, version=0.10.1-SNAPSHOT, revision=f049d93a69bdc5bd736301bf7081fa5cef2694bf,
SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2020-01-27T09:27:08Z
]
{code}

jstacks of the surefire process:  [^jstack.log] ...later:  [^jstack6.log] 

> TestSpeculation is flaky
> ------------------------
>
>                 Key: TEZ-4119
>                 URL: https://issues.apache.org/jira/browse/TEZ-4119
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: jstack.log, jstack4.log, jstack6.log, org.apache.tez.dag.app.TestSpeculation-output.txt
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message