reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Shulman <shulm...@gmail.com>
Subject Re: Issues with running on yarn
Date Wed, 13 Apr 2016 04:58:38 GMT
Markus,

Can you please also try and see what error you are getting? Dhurv is
getting NoSuchMethod error which usually indicate incompatible jar file. I
on the other hand have different failure without any significant stack
trace, and when I ran from VS it actually succeed. I tried adding more
traces but don't see anything. So I am out of ideas here.

On Tue, Apr 12, 2016 at 9:45 PM, Dhruv Mahajan <dhruv.mahajan@gmail.com>
wrote:

> Boris
> As discussed over chat, I did deep cleaning and recompiled. The code
> compiled successfully for me. However, the same errors in driver persist.
>
> Dhruv
>
> On Tue, Apr 12, 2016 at 8:05 PM, Dhruv Mahajan <dhruv.mahajan@gmail.com>
> wrote:
>
> > Boris
> >
> > Unfortunately I have no experience with this part of REEF to help you
> with
> > :( .
> >
> > Dhruv
> >
> > On Tue, Apr 12, 2016 at 8:02 PM, Boris Shulman <shulmanb@gmail.com>
> wrote:
> >
> >> I am not convinced I am getting the same failure. The last trace I see
> is:
> >>
> >> NFO: StartStateHandler: Driver started with endpoint identifier
> [socket://
> >> 127.0.0.1:9852]  and StartTime
> >> [org.apache.reef.wake.time.event.StartTime[1460516319005]]
> >>
> >> Apr 12, 2016 7:58:39 PM org.apache.reef.javabridge.generic.JobDriver
> >> setupBridge
> >>
> >> INFO: Initializing CLRBufferedLogHandler...
> >>
> >> Apr 12, 2016 7:58:39 PM org.apache.reef.javabridge.generic.JobDriver
> >> setupBridge
> >>
> >> WARNING: CLRBufferedLogHandler could not be initialized
> >>
> >> And the only thing I see in the carshdump is:
> >>
> >> Stack: [0x000000d51b8f0000,0x000000d51b9f0000]
> >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> >> j
> >>
> >>
> org.apache.reef.javabridge.NativeInterop.callClrSystemOnStartHandler(Ljava/lang/String;Ljava/lang/String;Lorg/apache/reef/javabridge/BridgeHandlerManager;Lorg/apache/reef/javabridge/EvaluatorRequestorBridge;)V+0
> >> j
> >>
> >>
> org.apache.reef.javabridge.generic.DriverStartClrHandlersInitializer.getClrHandlers(Ljava/lang/String;Lorg/apache/reef/javabridge/EvaluatorRequestorBridge;)Lorg/apache/reef/javabridge/BridgeHandlerManager;+18
> >> j
> >>
> >>
> org.apache.reef.javabridge.generic.JobDriver.setupBridge(Lorg/apache/reef/javabridge/generic/ClrHandlersInitializer;)V+230
> >> j
> >>
> >>
> org.apache.reef.javabridge.generic.JobDriver.access$1500(Lorg/apache/reef/javabridge/generic/JobDriver;Lorg/apache/reef/javabridge/generic/ClrHandlersInitializer;)V+2
> >> j
> >>
> >>
> org.apache.reef.javabridge.generic.JobDriver$StartHandler.onNext(Lorg/apache/reef/wake/time/event/StartTime;)V+34
> >> j
> >>
> >>
> org.apache.reef.javabridge.generic.JobDriver$StartHandler.onNext(Ljava/lang/Object;)V+5
> >> j
> >>
> >>
> org.apache.reef.runtime.common.driver.DriverStartHandler.onStart(Lorg/apache/reef/wake/time/event/StartTime;)V+31
> >> j
> >>
> >>
> org.apache.reef.runtime.common.driver.DriverStartHandler.onNext(Lorg/apache/reef/wake/time/event/StartTime;)V+20
> >> j
> >>
> >>
> org.apache.reef.runtime.common.driver.DriverStartHandler.onNext(Ljava/lang/Object;)V+5
> >> j
> >>
> >>
> org.apache.reef.wake.impl.PubSubEventHandler.onNext(Ljava/lang/Object;)V+125
> >> j  org.apache.reef.wake.time.runtime.RuntimeClock.run()V+177
> >> j
> >>
> >>
> org.apache.reef.runtime.common.REEFLauncher.main([Ljava/lang/String;)V+119
> >> v  ~StubRoutines::call_stub
> >>
> >>
> >> So it looks like it is crushing before constructing EvaluatorRequestor
> >> where my code is invoked for the first time.
> >> Any ideas?
> >>
> >> On Tue, Apr 12, 2016 at 7:36 PM, Boris Shulman <shulmanb@gmail.com>
> >> wrote:
> >>
> >> > Another interesting observation is that both HelloReef and IMRU
> examples
> >> > runs perfectly fine when are launched from Visual Studio and fail when
> >> are
> >> > launched from command line. Any ideas?
> >> >
> >> > On Tue, Apr 12, 2016 at 7:26 PM, Boris Shulman <shulmanb@gmail.com>
> >> wrote:
> >> >
> >> >> I wonder why our functional tests run in a different way from the
> >> >> examples. Examples fail (even HelloReef on local) while all tests
> pass.
> >> >>
> >> >> On Tue, Apr 12, 2016 at 7:00 PM, Boris Shulman <shulmanb@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> Also when I ran clean build I am getting:
> >> >>>
> >> >>> "C:\work\GitHub\incubator-reef\lang\cs\Org.Apache.REEF.sln" (default
> >> >>> target) (1) ->
> >> >>>
> >>
> "C:\work\GitHub\incubator-reef\lang\cs\Org.Apache.REEF.Tang.Tests\Org.Apache.REEF.Tang.Tests.csproj"
> >> >>> (default target) (7) ->
> >> >>> (RestorePackages target) ->
> >> >>>   C:\work\GitHub\incubator-reef\lang\cs\.nuget\NuGet.targets(135,9):
> >> >>> error : Access to the path
> >> >>>
> >>
> 'C:\work\GitHub\incubator-reef\lang\cs\packages\xunit.extensibility.core.2.1.0\lib\portable-net45+win8+wp8+wpa81\xunit.core.xml'
> >> >>> is denied. [C:\work\GitHub\incubator-reef\lan
> >> >>> g\cs\Org.Apache.REEF.Tang.Tests\Org.Apache.REEF.Tang.Tests.csproj]
> >> >>>   C:\work\GitHub\incubator-reef\lang\cs\.nuget\NuGet.targets(135,9):
> >> >>> error MSB3073: The command
> >> >>> ""C:\work\GitHub\incubator-reef\lang\cs\.nuget\NuGet.exe" install
> >> >>>
> >>
> "C:\work\GitHub\incubator-reef\lang\cs\Org.Apache.REEF.Tang.Tests\packages.config"
> >> >>> -source ""  -NonInteracti
> >> >>> ve -RequireConsent -solutionDir
> >> "C:\work\GitHub\incubator-reef\lang\cs\
> >> >>> "" exited with code 1.
> >> >>>
> >>
> [C:\work\GitHub\incubator-reef\lang\cs\Org.Apache.REEF.Tang.Tests\Org.Apache.REEF.Tang.Tests.csproj]
> >> >>>
> >> >>>
> >> >>> Second build succeed, so maybe dlls are not produced on first build.
> >> >>>
> >> >>> All functional tests pass after second build. Examples still failed.
> >> >>> Will fix that tonight.
> >> >>>
> >> >>> On Tue, Apr 12, 2016 at 6:07 PM, Boris Shulman <shulmanb@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> This is strange. Both me and Markus tested that. Are y sure
u built
> >> >>>> bridge code? This method was added there.
> >> >>>> ------------------------------
> >> >>>> From: Julia Wang (QIUHE) <Qiuhe.Wang@microsoft.com>
> >> >>>> Sent: ‎4/‎12/‎2016 6:03 PM
> >> >>>> To: dev@reef.apache.org
> >> >>>> Subject: RE: Issues with running on yarn
> >> >>>>
> >> >>>> After a clean build, I got same error. Pick up any REEF functional
> >> >>>> test, it will fail at beginning of Driver start.
> >> >>>>
> >> >>>> ____________________________
> >> >>>> SEVERE: Unable to instantiate the clock
> >> >>>> java.lang.NoSuchMethodError: getDefinedRuntimes
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.javabridge.NativeInterop.callClrSystemOnStartHandler(Native
> >> >>>> Method)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.javabridge.generic.DriverStartClrHandlersInitializer.getClrHandlers(DriverStartClrHandlersInitializer.java:47)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.javabridge.generic.JobDriver.setupBridge(JobDriver.java:196)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.javabridge.generic.JobDriver.access$1500(JobDriver.java:66)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.javabridge.generic.JobDriver$StartHandler.onNext(JobDriver.java:583)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.javabridge.generic.JobDriver$StartHandler.onNext(JobDriver.java:577)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.runtime.common.driver.DriverStartHandler.onStart(DriverStartHandler.java:93)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.runtime.common.driver.DriverStartHandler.onNext(DriverStartHandler.java:71)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.runtime.common.driver.DriverStartHandler.onNext(DriverStartHandler.java:40)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.wake.impl.PubSubEventHandler.onNext(PubSubEventHandler.java:98)
> >> >>>> at
> >> >>>>
> >>
> org.apache.reef.wake.time.runtime.RuntimeClock.run(RuntimeClock.java:217)
> >> >>>> at
> >> >>>>
> >> org.apache.reef.runtime.common.REEFLauncher.main(REEFLauncher.java:175)
> >> >>>> __________________
> >> >>>> Option 0 [-XX:PermSize=128m]
> >> >>>> Option 1 [-XX:MaxPermSize=128m]
> >> >>>> Option 2 [-Xmx512m]
> >> >>>> Option 3
> >> >>>>
> >>
> [-Djava.class.path=;C:\reef\ReefApache\reef1\lang\cs\bin\x64\Debug\Org.Apache.REEF.Tests\REEF_LOCAL_RUNTIME6ae4196f\reef-ContextStartDriver-20160412174234260\driver\reef\global\reef-bridge-client-0.15.0-SNAPSHOT-shaded.jar;C:\reef\ReefApache\reef1\lang\cs\bin\x64\Debug\Org.Apache.REEF.Tests\.\reef-bridge-client-0.15.0-SNAPSHOT-shaded.jar]
> >> >>>> Option 4 [-Dproc_reef]
> >> >>>> Found class
> >> 'org/apache/reef/javabridge/NativeInterop'<C++>InteropUtil
> >> >>>> Information: 0 : 2016-04-12T17:43:23.0956095-07:00 0001
> >> >>>> INFO:
> >> >>>>
> >>
> +Java_org_apache_reef_javabridge_NativeInterop_callClrSystemOnStartHandler
> >> >>>> <C++> Start: 0 : 2016-04-12T17:43:23.1036094-07:00 0001
> >> >>>> START: EvaluatorRequestorClr2Java::EvaluatorRequestorClr2Java
> >> >>>> <C++> Stop: 0 : 2016-04-12T17:43:23.1041093-07:00 0001
> >> >>>> EXIT: EvaluatorRequestorClr2Java::EvaluatorRequestorClr2Java
> >> >>>> <C++> Start: 0 : 2016-04-12T17:43:23.1526168-07:00 0001
> >> >>>> START: EvaluatorRequestorClr2Java::GetDefinedRuntimes
> >> >>>> jmidGetDefinedRuntimes is NULL
> >> >>>> <C++>InteropUtil Error: 0 : 2016-04-12T17:43:23.4991608-07:00
0001
> >> >>>> ERROR: Exceptions in
> >> >>>>
> >>
> Java_org_apache_reef_javabridge_NativeInterop_callClrSystemOnStartHandlerencountered
> >> >>>> error [System.ArgumentNullException: Buffer cannot be null.
> >> >>>> Parameter name: buffer
> >> >>>>    at System.IO.MemoryStream..ctor(Byte[] buffer, Boolean writable)
> >> >>>>    at
> >> >>>>
> >>
> Org.Apache.REEF.Driver.Bridge.Avro.DefinedRuntimesSerializer.FromBytes(Byte[]
> >> >>>> serializedData)
> >> >>>>    at
> >> >>>>
> >>
> Org.Apache.REEF.Driver.Bridge.Events.EvaluatorRequestor..ctor(IEvaluatorRequestorClr2Java
> >> >>>> clr2Java)
> >> >>>>    at
> >> >>>>
> >>
> Org.Apache.REEF.Driver.Bridge.ClrSystemHandlerWrapper.Call_ClrSystemStartHandler_OnStart(DateTime
> >> >>>> startTime, String httpServerPort, IEvaluatorRequestorClr2Java
> >> >>>> evaluatorRequestorClr2Java)
> >> >>>>    at
> >> >>>>
> >>
> Java_org_apache_reef_javabridge_NativeInterop_callClrSystemOnStartHandler(JNIEnv_*
> >> >>>> env, _jclass* jclassx, _jstring* dateTimeString, _jstring*
> >> httpServerPort,
> >> >>>> _jobject* jbridgeHandlerManager, _jobject*
> >> jevaluatorRequestorBridge)] with
> >> >>>> mesage [Buffer cannot be null.
> >> >>>> Parameter name: buffer] and stack trace [   at
> >> >>>> System.IO.MemoryStream..ctor(Byte[] buffer, Boolean writable)
> >> >>>>    at
> >> >>>>
> >>
> Org.Apache.REEF.Driver.Bridge.Avro.DefinedRuntimesSerializer.FromBytes(Byte[]
> >> >>>> serializedData)
> >> >>>>    at
> >> >>>>
> >>
> Org.Apache.REEF.Driver.Bridge.Events.EvaluatorRequestor..ctor(IEvaluatorRequestorClr2Java
> >> >>>> clr2Java)
> >> >>>>    at
> >> >>>>
> >>
> Org.Apache.REEF.Driver.Bridge.ClrSystemHandlerWrapper.Call_ClrSystemStartHandler_OnStart(DateTime
> >> >>>> startTime, String httpServerPort, IEvaluatorRequestorClr2Java
> >> >>>> evaluatorRequestorClr2Java)
> >> >>>>    at
> >> >>>>
> >>
> Java_org_apache_reef_javabridge_NativeInterop_callClrSystemOnStartHandler(JNIEnv_*
> >> >>>> env, _jclass* jclassx, _jstring* dateTimeString, _jstring*
> >> httpServerPort,
> >> >>>> _jobject* jbridgeHandlerManager, _jobject*
> >> jevaluatorRequestorBridge)]
> >> >>>>
> >> >>>> -----Original Message-----
> >> >>>> From: Markus Weimer [mailto:markus@weimo.de]
> >> >>>> Sent: Tuesday, April 12, 2016 5:22 PM
> >> >>>> To: dev@reef.apache.org
> >> >>>> Subject: Re: Issues with running on yarn
> >> >>>>
> >> >>>> On 2016-04-12 16:50, Dhruv Mahajan wrote:
> >> >>>> > I see very wierd....Lemme again try rerunning...I was
doing:
> >> >>>> >
> >> >>>> > mvn clean install -DskipTests followed by compiling C#
code.
> Lemme
> >> >>>> retry.
> >> >>>>
> >> >>>> I am very paranoid when it comes to clean builds and execute
the
> >> >>>> following as a clean in powershell:
> >> >>>>
> >> >>>> ```
> >> >>>> function Clean-REEF{
> >> >>>>    Invoke-Expression 'msbuild
> >> >>>> $REEFSourcePath\lang\cs\Org.Apache.REEF.sln /m /nr:false /t:Clean'
> >> >>>>    # Perform some more deletes because I don't trust MSBuild
> >> >>>>    Stop-Process -Force -Name MSBuild
> >> >>>>    Get-ChildItem -Path $REEFSourcePath\lang\cs\ -Recurse -Filter
> obj
> >> >>>>              | Remove-Item -Recurse
> >> >>>>    Get-ChildItem -Path $REEFSourcePath\lang\cs\ -Recurse -Filter
> bin
> >> >>>>              | Remove-Item -Recurse
> >> >>>>    Get-ChildItem -Path $REEFSourcePath\lang\cs\ -Recurse -Filter
> >> target
> >> >>>>              | Remove-Item -Recurse
> >> >>>>    Get-ChildItem -Path $REEFSourcePath\lang\cs\ -Recurse -Filter
> >> >>>> REEF_LOCAL_RUNTIME | Remove-Item -Recurse
> >> >>>>    Get-ChildItem -Path $REEFSourcePath\lang\cs\ -Recurse -Filter
> >> >>>> TestResults        | Remove-Item -Recurse
> >> >>>>    Get-ChildItem -Path $REEFSourcePath\lang\cs\ -Recurse -Filter
> >> >>>> packages           | Remove-Item -Recurse
> >> >>>> }
> >> >>>> ```
> >> >>>>
> >> >>>> This deletes everything, including the downloaded NuGets.
> >> >>>>
> >> >>>> Markus
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message