reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergiy Matusevych (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1782) REEF-on-REEF host driver closes prematurely
Date Tue, 18 Apr 2017 23:54:41 GMT

    [ https://issues.apache.org/jira/browse/REEF-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973764#comment-15973764
] 

Sergiy Matusevych commented on REEF-1782:
-----------------------------------------

To reproduce, make sure you have Hadoop 2.7.3+ cluster (earlier versions of YARN have a bug
that prevents REEF from running in Unamanged AM mode), and run
{code}
./bin/run.sh org.apache.reef.examples.reefonreef.Launch
{code}
on Linux, or
{code}
.\bin\runreef.ps1 -VerboseLog -Jars .\lang\java\reef-examples\target\reef-examples-0.16.0-SNAPSHOT-shaded.jar
-Class org.apache.reef.examples.reefonreef.Launch
{code}
in Windows PowerShell.

> REEF-on-REEF host driver closes prematurely
> -------------------------------------------
>
>                 Key: REEF-1782
>                 URL: https://issues.apache.org/jira/browse/REEF-1782
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF Driver, REEF Runtime YARN
>         Environment: YARN 2.7.3+
>            Reporter: Sergiy Matusevych
>            Assignee: Sergiy Matusevych
>              Labels: bug, yarn
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> REEF-on-REEF application runs on YARN, and the inner application completes successfully;
however, the host application's driver closes prematurely and has the {{FAILED/FAILED}} status
in RM:
> {code}
> $ yarn application -list -appStates ALL
>                 Application-Id      Application-Name        Application-Type        
 User           Queue                   State             Final-State             Progress
                       Tracking-URL
> application_1492554568254_0013     REEF-on-REEF:host                    YARN        hadoop
     root.hadoop                 FAILED                  FAILED                 100% http://cisl-linux-070:8088/cluster/app/application_1492554568254_0013
> application_1492554568254_0014    REEF-on-REEF:hello                    YARN        hadoop
     root.hadoop               FINISHED               SUCCEEDED                 100%     
                           N/A
> {code}
> Most likely, that happens because on completion the inner application closes some resources
that either belong to the host app, or are shared with it.
> Here's a fragment of the dirver log:
> {code}
> 2017-04-18 19:15:52,332 INFO reef.examples.reefonreef.ReefOnReefDriver.onNext main |
REEF-on-REEF inner job application_1492554568254_0014 completed: state DONE
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.REEFEnvironment.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.wake.time.runtime.RuntimeClock.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.wake.time.runtime.RuntimeClock.close main | RETURN
Clock has already been closed
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.launch.REEFErrorHandler.close main
| ENTRY
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.utils.RemoteManager.close main | ENTRY
> 2017-04-18 19:15:52,332 FINE reef.wake.remote.impl.DefaultRemoteManagerImplementation.close
main | RemoteManager: REEF_UNMANAGED_DRIVER Closing remote manager id: socket://10.200.91.65:16952
> 2017-04-18 19:15:52,332 FINE reef.wake.remote.impl.DefaultRemoteManagerImplementation.close
main | RemoteManager: REEF_UNMANAGED_DRIVER already closed
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.utils.RemoteManager.close main | RETURN
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.launch.REEFErrorHandler.close main
| RETURN
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.REEFEnvironment.close main | RETURN
> 2017-04-18 19:15:52,332 INFO reef.examples.reefonreef.ReefOnReefDriver.onNext main |
REEF-on-REEF host job REEF-on-REEF:host completed: inner app application_1492554568254_0014
status SUBMITTED
> {code}
> i.e. some driver resources has already been closed at the end of the inner app.
> Another good test for that behavior would be running *two* inner applications in Unmanaged
AM mode sequentially from the same host driver.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message