[ https://issues.apache.org/jira/browse/TEZ-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024431#comment-17024431
]
László Bodor edited comment on TEZ-4119 at 1/27/20 3:51 PM:
------------------------------------------------------------
sometimes it can be reproduced locally, by running all tez-dag tests (normally, TestSpeculation
should finish in 3-5 seconds):
from tez-dag dir:
{code}
mvn --batch-mode clean test -fae
{code}
there is a suspicious no-op for almost 2 minutes:
{code}
2020-01-27 16:33:22,162 INFO [Dispatcher thread {Central}] app.DAGAppMaster (DAGAppMaster.java:handle(872))
- Completed cleanup for DAG: name=test, with id=dag_1580139200870_0001_1
2020-01-27 16:35:00,864 INFO [Thread-53] client.TezClient (TezClient.java:<init>(210))
- Tez Client Version: [ component=tez-api, version=0.10.1-SNAPSHOT, revision=f049d93a69bdc5bd736301bf7081fa5cef2694bf,
SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2020-01-27T09:27:08Z
]
{code}
jstacks of the surefire process: [^jstack.log] ...later: [^jstack4.log] ...later: [^jstack6.log]
something strange:
testBasicSpeculationPerVertexConf contains a 200ms sleep, but it tooks much longer
{code}
"Thread-3" #16 prio=5 os_prio=31 tid=0x00007fe07a3bc800 nid=0xa403 waiting on condition [0x00007000062d8000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.tez.dag.app.TestSpeculation.testBasicSpeculationPerVertexConf(TestSpeculation.java:261)
{code}
10 seconds later:
{code}
"Thread-3" #16 prio=5 os_prio=31 tid=0x00007fe07a3bc800 nid=0xa403 waiting on condition [0x00007000062d8000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.tez.dag.app.TestSpeculation.testBasicSpeculationPerVertexConf(TestSpeculation.java:261)
{code}
was (Author: abstractdog):
sometimes it can be reproduced locally, by running all tez-dag tests (normally, TestSpeculation
should finish in 3-5 seconds):
from tez-dag dir:
{code}
mvn --batch-mode clean test -fae
{code}
there is a suspicious no-op for almost 2 minutes:
{code}
2020-01-27 16:33:22,162 INFO [Dispatcher thread {Central}] app.DAGAppMaster (DAGAppMaster.java:handle(872))
- Completed cleanup for DAG: name=test, with id=dag_1580139200870_0001_1
2020-01-27 16:35:00,864 INFO [Thread-53] client.TezClient (TezClient.java:<init>(210))
- Tez Client Version: [ component=tez-api, version=0.10.1-SNAPSHOT, revision=f049d93a69bdc5bd736301bf7081fa5cef2694bf,
SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2020-01-27T09:27:08Z
]
{code}
jstacks of the surefire process: [^jstack.log] ...later: [^jstack6.log]
> TestSpeculation is flaky
> ------------------------
>
> Key: TEZ-4119
> URL: https://issues.apache.org/jira/browse/TEZ-4119
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Attachments: jstack.log, jstack4.log, jstack6.log, org.apache.tez.dag.app.TestSpeculation-output.txt
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
|