tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-1893) Some vertex init fail are still not propagated to clients
Date Mon, 02 Feb 2015 09:25:35 GMT

     [ https://issues.apache.org/jira/browse/TEZ-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Zhang updated TEZ-1893:
----------------------------
    Description: 
{code}
          throw new TezUncheckedException(vertex.getLogIdentifier() +
          " has -1 tasks but does not have input initializers, " +
          "1-1 uninited sources or custom vertex manager to set it at runtime");
{code}

{code}
Preconditions.checkState(getContext().getVertexNumTasks(getContext().getVertexName()) == -1,
            "Parallelism for the vertex should be set to -1 if the InputInitializer is setting
parallelism"
                + ", VertexName: " + getContext().getVertexName());
Preconditions.checkState(configuredInputName == null,
            "RootInputVertexManager cannot configure multiple inputs. Use a custom VertexManager"
                + ", VertexName: " + getContext().getVertexName() + ", ConfiguredInput: "
                + configuredInputName + ", CurrentInput: " + inputName);
{code}

IMO, for these kind of verification we could do it in client side (DAG.verify)

The following are the message on the client side, the reason that Client could not get the
real status of DAG is that Tez AM is killed due to this vertex init error
{code}
19:25:33,716 - Thread( main) - (RMProxy.java:98) - Connecting to ResourceManager at /0.0.0.0:8032
19:25:33,717 - Thread( main) - (AHSProxy.java:42) - Connecting to Application History server
at /0.0.0.0:10200
19:25:34,724 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:35,725 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:36,726 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:36,846 - Thread( main) - (DAGClientImpl.java:463) - DAG initialized: CurrentState=Running
19:25:38,351 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:39,352 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:40,354 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:41,356 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:42,357 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:43,358 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:44,359 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:45,360 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:46,361 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:47,362 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:47,369 - Thread( main) - (DAGClientImpl.java:463) - DAG completed. FinalState=FAILED
19:25:47,369 - Thread( main) - (TezWordCount.java:203) - status=FAILED, progress=null, diagnostics=Session
stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
, counters=null
19:25:47,372 - Thread( main) - (TezClient.java:470) - Shutting down Tez Session, sessionName=commonName,
applicationId=application_1420335690331_0007
19:25:47,374 - Thread( main) - (TezClientUtils.java:838) - Application not running, applicationId=application_1420335690331_0007,
yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0

19:25:47,375 - Thread( main) - (TezClient.java:484) - Failed to shutdown Tez Session via proxy
org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1420335690331_0007,
yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0

	at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839)
	at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669)
	at org.apache.tez.client.TezClient.stop(TezClient.java:476)
	at com.zjffdu.tez.tutorial.TezWordCount.main(TezWordCount.java:204)
19:25:47,377 - Thread( main) - (TezClient.java:489) - Could not connect to AM, killing session
via YARN, sessionName=commonName, applicationId=application_1420335690331_0007
19:25:47,381 - Thread( main) - (YarnClientImpl.java:364) - Killed application application_1420335690331_0007
{code}

  was:
{code}
          throw new TezUncheckedException(vertex.getLogIdentifier() +
          " has -1 tasks but does not have input initializers, " +
          "1-1 uninited sources or custom vertex manager to set it at runtime");
{code}

IMO, for this kind of verification we could do it in client side (DAG.verify)

The following are the message on the client side, the reason that Client could not get the
real status of DAG is that Tez AM is killed due to this vertex init error
{code}
19:25:33,716 - Thread( main) - (RMProxy.java:98) - Connecting to ResourceManager at /0.0.0.0:8032
19:25:33,717 - Thread( main) - (AHSProxy.java:42) - Connecting to Application History server
at /0.0.0.0:10200
19:25:34,724 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:35,725 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:36,726 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:36,846 - Thread( main) - (DAGClientImpl.java:463) - DAG initialized: CurrentState=Running
19:25:38,351 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:39,352 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:40,354 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:41,356 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:42,357 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:43,358 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:44,359 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:45,360 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:46,361 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:47,362 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
19:25:47,369 - Thread( main) - (DAGClientImpl.java:463) - DAG completed. FinalState=FAILED
19:25:47,369 - Thread( main) - (TezWordCount.java:203) - status=FAILED, progress=null, diagnostics=Session
stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
, counters=null
19:25:47,372 - Thread( main) - (TezClient.java:470) - Shutting down Tez Session, sessionName=commonName,
applicationId=application_1420335690331_0007
19:25:47,374 - Thread( main) - (TezClientUtils.java:838) - Application not running, applicationId=application_1420335690331_0007,
yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0

19:25:47,375 - Thread( main) - (TezClient.java:484) - Failed to shutdown Tez Session via proxy
org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1420335690331_0007,
yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0

	at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839)
	at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669)
	at org.apache.tez.client.TezClient.stop(TezClient.java:476)
	at com.zjffdu.tez.tutorial.TezWordCount.main(TezWordCount.java:204)
19:25:47,377 - Thread( main) - (TezClient.java:489) - Could not connect to AM, killing session
via YARN, sessionName=commonName, applicationId=application_1420335690331_0007
19:25:47,381 - Thread( main) - (YarnClientImpl.java:364) - Killed application application_1420335690331_0007
{code}


> Some vertex init fail are still not propagated to clients
> ---------------------------------------------------------
>
>                 Key: TEZ-1893
>                 URL: https://issues.apache.org/jira/browse/TEZ-1893
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> {code}
>           throw new TezUncheckedException(vertex.getLogIdentifier() +
>           " has -1 tasks but does not have input initializers, " +
>           "1-1 uninited sources or custom vertex manager to set it at runtime");
> {code}
> {code}
> Preconditions.checkState(getContext().getVertexNumTasks(getContext().getVertexName())
== -1,
>             "Parallelism for the vertex should be set to -1 if the InputInitializer is
setting parallelism"
>                 + ", VertexName: " + getContext().getVertexName());
> Preconditions.checkState(configuredInputName == null,
>             "RootInputVertexManager cannot configure multiple inputs. Use a custom VertexManager"
>                 + ", VertexName: " + getContext().getVertexName() + ", ConfiguredInput:
"
>                 + configuredInputName + ", CurrentInput: " + inputName);
> {code}
> IMO, for these kind of verification we could do it in client side (DAG.verify)
> The following are the message on the client side, the reason that Client could not get
the real status of DAG is that Tez AM is killed due to this vertex init error
> {code}
> 19:25:33,716 - Thread( main) - (RMProxy.java:98) - Connecting to ResourceManager at /0.0.0.0:8032
> 19:25:33,717 - Thread( main) - (AHSProxy.java:42) - Connecting to Application History
server at /0.0.0.0:10200
> 19:25:34,724 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:35,725 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:36,726 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:36,846 - Thread( main) - (DAGClientImpl.java:463) - DAG initialized: CurrentState=Running
> 19:25:38,351 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:39,352 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:40,354 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:41,356 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:42,357 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:43,358 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:44,359 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:45,360 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:46,361 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:47,362 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000.
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 19:25:47,369 - Thread( main) - (DAGClientImpl.java:463) - DAG completed. FinalState=FAILED
> 19:25:47,369 - Thread( main) - (TezWordCount.java:203) - status=FAILED, progress=null,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
> , counters=null
> 19:25:47,372 - Thread( main) - (TezClient.java:470) - Shutting down Tez Session, sessionName=commonName,
applicationId=application_1420335690331_0007
> 19:25:47,374 - Thread( main) - (TezClientUtils.java:838) - Application not running, applicationId=application_1420335690331_0007,
yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
> 19:25:47,375 - Thread( main) - (TezClient.java:484) - Failed to shutdown Tez Session
via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1420335690331_0007,
yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
> 	at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839)
> 	at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669)
> 	at org.apache.tez.client.TezClient.stop(TezClient.java:476)
> 	at com.zjffdu.tez.tutorial.TezWordCount.main(TezWordCount.java:204)
> 19:25:47,377 - Thread( main) - (TezClient.java:489) - Could not connect to AM, killing
session via YARN, sessionName=commonName, applicationId=application_1420335690331_0007
> 19:25:47,381 - Thread( main) - (YarnClientImpl.java:364) - Killed application application_1420335690331_0007
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message