reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Shulman <shulm...@gmail.com>
Subject RE: Local runtime Evaluator exits before PID is written
Date Mon, 29 Feb 2016 07:09:52 GMT
Yes. Note that it does not affect the result. 

-----Original Message-----
From: "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com>
Sent: ‎2/‎28/‎2016 11:07 PM
To: "dev@reef.apache.org" <dev@reef.apache.org>
Subject: RE: Local runtime Evaluator exits before PID is written

Are you running HelloREEF with local runtime using master code? 

-----Original Message-----
From: Boris Shulman [mailto:shulmanb@gmail.com] 
Sent: Sunday, February 28, 2016 10:53 PM
To: dev@reef.apache.org
Subject: Re: Local runtime Evaluator exits before PID is written

It actually does. Haven't noticed that before. Just ran master on my laptop and got the same:

SEVERE: Unable to kill the process.
java.io.FileNotFoundException:
C:\work\GitHub\incubator-reef\lang\java\reef-examples\target\REEF_LOCAL_RUNTIME\HelloREEF-1456728708940\Node-2-1456728710592\PID.txt
(The system cannot find the file specified) at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(https://na01.safelinks.protection.outlook.com/?url=FileInputStream.java%3a146&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=f4PidOi9Ck3KNHiOX6zCBZQiBhSd7KU8QU4PIVhXFwM%3d)
at java.io.FileInputStream.<init>(https://na01.safelinks.protection.outlook.com/?url=FileInputStream.java%3a101&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=9UYuWnGB2A3c%2fcT0MpYqVn8DlNMtQ%2fUQJs64hrC8E58%3d)
at
org.apache.reef.runtime.local.process.RunnableProcess.readPID(https://na01.safelinks.protection.outlook.com/?url=RunnableProcess.java%3a240&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=WfgEALSv%2f6lfK9apTkpii%2fAHtgFvKBl0jAmjyolPSrA%3d)
at
org.apache.reef.runtime.local.process.RunnableProcess.cancel(https://na01.safelinks.protection.outlook.com/?url=RunnableProcess.java%3a218&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=W4J5YCuM3fDJhk5ELqAh9Opt4tyH8D4i5B39TAaxyHk%3d)
at
org.apache.reef.runtime.local.driver.ProcessContainer.close(https://na01.safelinks.protection.outlook.com/?url=ProcessContainer.java%3a181&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=09H%2fV9tA7VrXl5CVNZPEYabRjJbJtBrhGfNHBZTFTjo%3d)
at
org.apache.reef.runtime.local.driver.ContainerManager.release(https://na01.safelinks.protection.outlook.com/?url=ContainerManager.java%3a377&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=F4gII93a2XtHLtgbTykR1znK0bqLEJcTj8SAR9tnZWg%3d)
at
org.apache.reef.runtime.local.driver.ResourceManager.onResourceReleaseRequest(https://na01.safelinks.protection.outlook.com/?url=ResourceManager.java%3a135&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=28a3yWwqwqqm4j1DDmaFWPM9laAeCpFoVf%2be62JlsTY%3d)
at
org.apache.reef.runtime.local.driver.LocalResourceReleaseHandler.onNext(https://na01.safelinks.protection.outlook.com/?url=LocalResourceReleaseHandler.java%3a45&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=%2b5eHe5dqV8WD6VDQi%2fAgUEP4%2fe%2fHaBfis8pzW9aG8i4%3d)
at
org.apache.reef.runtime.local.driver.LocalResourceReleaseHandler.onNext(https://na01.safelinks.protection.outlook.com/?url=LocalResourceReleaseHandler.java%3a32&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=XGXgMjRgEoyS4K5S2wjksiwUIeEAMO7OAXyUy05X3Go%3d)
at
org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager$1.onNext(https://na01.safelinks.protection.outlook.com/?url=EvaluatorManager.java%3a246&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=JjQao1wYhpr%2bNpedwg8BSUpJ31AlJDZrveP%2b8yNAChs%3d)
at
org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager$1.onNext(https://na01.safelinks.protection.outlook.com/?url=EvaluatorManager.java%3a243&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=2CtCaMIgEKDzkTFznUgaKVh%2fnsHbwqbFsY7r43OrZ7Q%3d)
at org.apache.reef.wake.time.event.Alarm.handle(https://na01.safelinks.protection.outlook.com/?url=Alarm.java%3a37&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Vhk%2f9iGQ1%2fsSKD5PrH3YFoKXfRtaau1HBVoRwfwBXV0%3d)
at org.apache.reef.wake.time.runtime.RuntimeClock.run(https://na01.safelinks.protection.outlook.com/?url=RuntimeClock.java%3a250&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=tz9fI5xwolMfRqdNMbOzAcnqbpIF4MufoqVAGfOgY1w%3d)
at org.apache.reef.runtime.common.REEFLauncher.main(https://na01.safelinks.protection.outlook.com/?url=REEFLauncher.java%3a174&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=wL7XOVyoOXmKEr%2fncEYmafIUFn5a8SiqfpmwpRMVtiw%3d)



On Sun, Feb 28, 2016 at 10:39 PM, Markus Weimer <markus@weimo.de> wrote:

> Does the same thing happen in current `master` on the same machine?
>
> Markus
>
> On Fri, Feb 26, 2016 at 5:53 PM, Boris Shulman <shulmanb@gmail.com> wrote:
> > While debugging multi-runtime implementation I noticed that 
> > sometimes evaluator exists without writing the PID. This happens 
> > only when using
> the
> > simple HelloREEF task that just prints a trace and exits. And it 
> > doesn't happen when either adding short sleep to it or using Task 
> > that receives
> and
> > sends message back.
> >
> > I added more traces and I see the following in the evaluator traces 
> > that when this issue happens:
> >
> > Feb 26, 2016 7:11:48 AM org.apache.reef.runtime.common.REEFLauncher 
> > main
> > INFO: Entering REEFLauncher.main().
> > Feb 26, 2016 7:11:49 AM org.apache.reef.util.REEFVersion logVersion
> > INFO: REEF Version: 0.14.0-SNAPSHOT
> > Feb 26, 2016 7:11:49 AM
> >
> org.apache.reef.runtime.common.evaluator.context.defaults.DefaultConte
> xtStartHandler
> > onNext
> > INFO: DefaultContextStartHandler received:
> >
> org.apache.reef.runtime.common.evaluator.context.ContextStartImpl@25c1
> 7335
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.context.ContextRuntime 
> > startTask
> > INFO: Started task: HelloREEFTask
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.task.TaskRuntime run
> > INFO: Informing registered EventHandler<TaskStart>.
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.task.TaskRuntime runTask
> > INFO: Calling Task.call() without a memento Feb 26, 2016 7:11:50 AM 
> > org.apache.reef.runtime.common.evaluator.task.TaskRuntime runTask
> > INFO: Task.call() exited cleanly.
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.task.TaskRuntime run
> > INFO: Task is done.
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.task.TaskRuntime run
> > INFO: Informing registered EventHandler<TaskStop>.
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.context.ContextRuntime 
> > close
> > WARNING: Shutting down a task because the underlying context is 
> > being closed.
> > Feb 26, 2016 7:11:50 AM
> > org.apache.reef.runtime.common.evaluator.task.TaskRuntime close
> > WARNING: Trying to close a task that is in state: DONE. Ignoring.
> > Feb 26, 2016 7:11:50 AM
> >
> org.apache.reef.runtime.common.evaluator.context.defaults.DefaultConte
> xtStopHandler
> > onNext
> > INFO: DefaultContextStopHandler received:
> > org.apache.reef.runtime.common.evaluator.context.ContextStopImpl@497
> > 9e4bc Feb 26, 2016 7:11:50 AM 
> > org.apache.reef.wake.time.runtime.RuntimeClock
> close
> > INFO: Clock.close()
> >
> >
> > And the following in the driver traces:
> >
> > INFO: Launching process "Node-1-1456467899870"
> > STDERR can be found in C:\evaluator.stderr STDOUT can be found in 
> > C:\evaluator.stdout Feb 26, 2016 6:25:00 AM 
> > org.apache.reef.runtime.local.process.ReefRunnableProcessObserver
> > onResourceStatus
> > INFO: Sending resource status:
> >
> org.apache.reef.runtime.common.driver.resourcemanager.ResourceStatusEv
> entImpl@4225a0f5
> > Feb 26, 2016 6:25:00 AM
> > org.apache.reef.runtime.common.evaluator.PIDStoreStartHandler onNext
> > INFO: Storing pid `4524` in file
> >
> c:\apps\temp\hdfs\nm-local-dir\usercache\azurenrtrdp\appcache\applicat
> ion_1454222353569_4319\container_1454222353569_4319_01_000001\PID.txt
> > Feb 26, 2016 6:25:02 AM
> > org.apache.reef.runtime.common.driver.task.TaskRepresenter 
> > onTaskInit
> > WARNING: Received a INIT message for task with id HelloREEFTask 
> > which we have seen before. Ignoring the second message Feb 26, 2016 
> > 6:25:02 AM 
> > org.apache.reef.examples.hello.HelloMultiDriver$TaskRunningHandler 
> > onNext
> > INFO: TaskRuntime: HelloREEFTask
> > Feb 26, 2016 6:25:02 AM
> >
> org.apache.reef.runtime.common.driver.defaults.DefaultTaskCompletionHa
> ndler
> > onNext
> > INFO: Received CompletedTask: CompletedTask{ID='HelloREEFTask'} ::
> CLOSING
> > context:
> > EvaluatorContext{contextIdentifier='RootContext_Node-1-1456467899870
> > ', evaluatorIdentifier='Node-1-1456467899870', 
> > parentID=OptionalvNothing} Feb 26, 2016 6:25:02 AM
> >
> org.apache.reef.runtime.common.driver.defaults.DefaultEvaluatorComplet
> ionHandler
> > onNext
> > INFO: Received CompletedEvaluator:
> > CompletedEvaluator{id='Node-1-1456467899870'}
> > Feb 26, 2016 6:25:02 AM
> > org.apache.reef.runtime.local.driver.ContainerManager release
> > INFO: Releasing Container with containerId 
> > [ProcessContainer{containedID='Node-1-1456467899870', 
> > nodeID='Node-1', errorHandlerRID='socket://100.115.132.76:18598',
> > folder=.\Node-1-1456467899870', rack=/default-rack}] Feb 26, 2016 
> > 6:25:02 AM org.apache.reef.runtime.local.driver.ProcessContainer 
> > close
> > WARNING: Force-closing a container that is still running:
> > ProcessContainer{containedID='Node-1-1456467899870', 
> > nodeID='Node-1', errorHandlerRID='socket://100.115.132.76:18598',
> > folder=.\Node-1-1456467899870', rack=/default-rack} Feb 26, 2016 
> > 6:25:02 AM 
> > org.apache.reef.wake.remote.transport.netty.NettyChannelHandler
> > exceptionCaught
> > INFO: Unexpected exception from downstream. channel: [id: 
> > 0xe7566f93, /
> > 100.115.132.76:56036 => /100.115.132.76:12212] local: /
> 100.115.132.76:12212
> > remote: /100.115.132.76:56036
> > Feb 26, 2016 6:25:02 AM
> > org.apache.reef.wake.remote.transport.netty.NettyChannelHandler
> > exceptionCaught
> > WARNING: Unexpected exception from downstream.
> > java.io.IOException: An existing connection was forcibly closed by 
> > the remote host
> >         at sun.nio.ch.SocketDispatcher.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(https://na01.safelinks.protection.outlook.com/?url=SocketDispatcher.java%3a43&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=e%2flxG1ZMjKgIbQ5usdr%2bhGp%2foSBFt0eCacJQCznMU5M%3d)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(https://na01.safelinks.protection.outlook.com/?url=IOUtil.java%3a223&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=uhE52jdItjRILYJRwdu%2bgzhOle%2f8caSXwlvvBMyUd0o%3d)
> >         at sun.nio.ch.IOUtil.read(https://na01.safelinks.protection.outlook.com/?url=IOUtil.java%3a192&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=fv9Keyxy1YaQahwiNMvk4Rc%2fu7RTB9MAJpyz%2fgfd1Ak%3d)
> >         at sun.nio.ch.SocketChannelImpl.read(https://na01.safelinks.protection.outlook.com/?url=SocketChannelImpl.java%3a379&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=P28HkYwqzGb0a9z0RwNDR9xnjYALqdDCpH7a%2f%2b3QD7k%3d)
> >         at
> >
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(https://na01.safe
> links.protection.outlook.com/?url=UnpooledUnsafeDirectByteBuf.java%3a4
> 46&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d
> 340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=UaG%2fTc2mWOlr%
> 2fv2Q4ebEqw1fu28DlPLR01VcSqlBK9U%3d)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(https://na01.safelinks.protection.outlook.com/?url=AbstractByteBuf.java%3a879&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=OI%2fS%2fi5JTk4fJlasWLe%2f5JDamEXisXLhk0g%2fk6MdYGw%3d)
> >         at
> >
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(https://na01.
> safelinks.protection.outlook.com/?url=NioSocketChannel.java%3a225&data
> =01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510
> b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=qyR3ZFlbWjYJWhAEaR%2b2
> vRKBPVCuBw6G7p4Uy9I9JZA%3d)
> >       at
> >
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(https:/
> /na01.safelinks.protection.outlook.com/?url=AbstractNioByteChannel.jav
> a%3a114&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4
> cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=FAsNZAcoDg
> VtKzl6Ip6kWOTwhhm3GwnPN6BYCP3ocRE%3d)
> >         at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKey(https://na01.safe
> links.protection.outlook.com/?url=NioEventLoop.java%3a511&data=01%7c01
> %7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f
> 988bf86f141af91ab2d7cd011db47%7c1&sdata=zRQ40mskxKgVLulfRivHa%2fIEesl4
> hKxHWCD%2fvb9ChDM%3d)
> >         at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(https:/
> /na01.safelinks.protection.outlook.com/?url=NioEventLoop.java%3a468&da
> ta=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d5
> 10b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=v0zE6E6bgTljI6djXwaZ
> Yeq8D23va1AWwUVZPC%2fMqEU%3d)
> >         at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(https://na01.saf
> elinks.protection.outlook.com/?url=NioEventLoop.java%3a382&data=01%7c0
> 1%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72
> f988bf86f141af91ab2d7cd011db47%7c1&sdata=N5pdj1T%2fGo3J1PQri2eBee%2b%2
> bccCo0U%2bJfLTVGDi7buk%3d)
> >         at io.netty.channel.nio.NioEventLoop.run(https://na01.safelinks.protection.outlook.com/?url=NioEventLoop.java%3a354&data=01%7c01%7cQiuhe.Wang%40microsoft.com%7c4adbc353e06c4a77c4cd08d340d510b7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=2z684PGUjJ2QZmip4xu%2f7n9faWMqm%2bEAjpQvSb9j2NI%3d)
> >         at
> >
> io.netty.util.concurrent.S

[The entire original message is not included.]
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message