hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Lowe <jl...@yahoo-inc.com.INVALID>
Subject Re: YARN Client and Unmanaged AM running in the same process?
Date Tue, 24 Jan 2017 22:59:01 GMT
Version 2.7.1.2.3.0.0-2557 is not a standard Apache Hadoop version, nor is the specified git
hash present in the Apache Hadoop repo.  I cannot match the code exactly with what you are
running.
What is happening according to the RM log is this sequence:- The RM is trying to publish system
metrics corresponding to the AM registration and that blows up with a NullPointerException-
Because of that NPE the app attempt state machine never properly transitions out of the the
LAUNCHED state- When the AM tries to unregister it is an error because the app is still in
the LAUNCHED state (due to the incomplete registration)
This looks like the same problem described in YARN-4452.  The version you are reporting implies
it is based on Apache Hadoop 2.7.1, and that problem was fixed in 2.7.3.  Check with your
vendor and see if that fix is included in your version.  I am guessing that it is not and
adding that patch will fix this use case for you.
Jason

 

    On Tuesday, January 24, 2017 1:53 PM, Sergiy Matusevych <sergiy.matusevych@gmail.com>
wrote:
 

 Hi Jason,

Here's what I see in our YARN logs (below). Could it be that I forget to
populate some field in the registration or, say, application/container
launch context?

P.S. YARN version that we are running is: 2.7.1.2.3.0.0-2557 from
9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1 by jenkins source checksum
72da60cd1e57ff22763939f46330224c on 2015-07-14T13:30Z.

Thanks a lot for your help!

Cheers,
Sergiy.


2017-01-23 19:39:40,544 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
new applicationId: 1252
2017-01-23 19:39:40,575 WARN
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific
max attempts: 0 for application: 1252 is invalid, because it is out of the
range [1, 20]. Use the global max attempts instead.
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
with id 1252 submitted by user sergiym
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
application with id application_1465994698013_1252
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=sergiym
IP=10.130.68.120 OPERATION=Submit Application Request
TARGET=ClientRMService RESULT=SUCCESS APPID=application_1465994698013_1252
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1465994698013_1252 State change from NEW to NEW_SAVING
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Storing info for app: application_1465994698013_1252
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1465994698013_1252 State change from NEW_SAVING to SUBMITTED
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Application added - appId: application_1465994698013_1252 user: sergiym
leaf-queue of parent: root #applications: 1
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Accepted application application_1465994698013_1252 from user: sergiym, in
queue: default
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1465994698013_1252 State change from SUBMITTED to ACCEPTED
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering app attempt : appattempt_1465994698013_1252_000001
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1465994698013_1252_000001 State change from NEW to SUBMITTED
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application application_1465994698013_1252 from user: sergiym activated in
queue: default
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application added - appId: application_1465994698013_1252 user:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@50ac15c8,
leaf-queue: default #user-pending-applications: 0
#user-active-applications: 1 #queue-pending-applications: 0
#queue-active-applications: 1
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Added Application Attempt appattempt_1465994698013_1252_000001 to scheduler
from user sergiym in queue default
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Storing attempt: AppId: application_1465994698013_1252 AttemptId:
appattempt_1465994698013_1252_000001 MasterContainer: null
2017-01-23 19:39:40,575 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1465994698013_1252_000001 State change from SUBMITTED to
LAUNCHED_UNMANAGED_SAVING
2017-01-23 19:39:40,591 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
Create AMRMToken for ApplicationAttempt:
appattempt_1465994698013_1252_000001
2017-01-23 19:39:40,591 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
Creating password for appattempt_1465994698013_1252_000001
2017-01-23 19:39:40,591 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1465994698013_1252_000001 State change from
LAUNCHED_UNMANAGED_SAVING to LAUNCHED
2017-01-23 19:39:42,250 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
Auth successful for appattempt_1465994698013_1252_000001 (auth:SIMPLE)
2017-01-23 19:39:42,266 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
registration appattempt_1465994698013_1252_000001
2017-01-23 19:39:42,266 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=sergiym
IP=10.130.68.120 OPERATION=Register App Master
TARGET=ApplicationMasterService RESULT=SUCCESS
APPID=application_1465994698013_1252
APPATTEMPTID=appattempt_1465994698013_1252_000001
2017-01-23 19:39:42,266 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type REGISTERED for applicationAttempt
application_1465994698013_1252
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:145)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1405)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1380)
at
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:850)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:831)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
2017-01-23 19:39:42,266 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1465994698013_1252 State change from ACCEPTED to RUNNING
2017-01-23 19:39:42,344 ERROR
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
STATUS_UPDATE at LAUNCHED
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:850)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:831)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
2017-01-23 19:39:42,407 ERROR
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
UNREGISTERED at LAUNCHED
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:850)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:831)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
2017-01-23 19:39:42,453 INFO org.apache.hadoop.ipc.Server: Socket Reader #1
for port 8032: readAndProcess from client 10.130.68.120 threw exception
[java.io.IOException: An existing connection was forcibly closed by the
remote host]
java.io.IOException: An existing connection was forcibly closed by the
remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2635)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1492)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:782)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:648)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:619)
2017-01-23 19:39:42,453 INFO org.apache.hadoop.ipc.Server: Socket Reader #1
for port 8030: readAndProcess from client 10.130.68.120 threw exception
[java.io.IOException: An existing connection was forcibly closed by the
remote host]
java.io.IOException: An existing connection was forcibly closed by the
remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2635)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1492)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:782)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:648)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:619)
2017-01-23 19:50:59,224 INFO
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
Expired:appattempt_1465994698013_1252_000001 Timed out after 600 secs


On Tue, Jan 24, 2017 at 8:48 AM, Jason Lowe <jlowe@yahoo-inc.com> wrote:
>
> Have you looked at the ResourceManager logs to see what it is doing when
it receives the unregister request?  I'm wondering if there's an exception
or error that could explain why it is not working as expected.  The sample
code works against trunk for me.  The unmanaged AM registered,
unregistered, and the final status of the application on the RM was
FINISHED/SUCCEEDED.
>
> Jason
>
>
> On Monday, January 23, 2017 9:51 PM, Sergiy Matusevych <
sergiy.matusevych@gmail.com> wrote:
>
>
> Hi fellow YARN developers,
>
> I am writing a YARN application that runs both Client *and* Unmanaged
> Application Master in the same JVM process. I have a toy example that
> starts a YARN application and the AM in Unmanaged mode, and then just
shuts
> it down:
>
>
https://github.com/apache/reef/blob/master/lang/java/reef-runtime-yarn/src/test/java/org/apache/reef/runtime/yarn/driver/unmanaged/UnmanagedAmTest.java
>
> (I wrapped it as a unit test, but the code is 100% independent of REEF, so
> you can copy & paste it if you want to play with it; I can also build a
> small maven project around it).
>
> The app *almost* works - the problem seems to be that the call on line 117
>
>    rmClient.unregisterApplicationMaster(FinalApplicationStatus.SUCCEEDED,
> "Success!", null);
>
> fails to update the status of the application on the Resource Manager
side.
> I would expect it to be FINISHED/SUCCEDED, but instead it stays in
> RUNNING/UNDEFINED well after the client/AM process terminates, and
> eventually got marked as FAILED/FAILED by the RM.
>
> Am I doing something wrong, or is that a bug in YARN? I found an old JIRA
> issue that seems to be related to the problem:
>
> https://issues.apache.org/jira/browse/YARN-273
> "Add an unmanaged AM client for in-process AMs"
>
> Can someone confirm if my problem is indeed related to that issue, or is
> there something wrong with my code?
>
> Thank you!
> Sergiy.
>
>


   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message