reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rogan Carr <rogan.c...@gmail.com>
Subject Re: Help on reef error messages from log output
Date Tue, 01 Aug 2017 22:30:55 GMT
Hi All,

On this:
>> 2. Not sure about UnauthorizedAccessException, it is trying to get
PerformanceCounter. It is a warning not error. Does it impact the running
result?

On HDInsight, we don't have permissions to get the status of the machine
the container is running on, so we get this error and don't report the
performance. It doesn't impact the run -- it just makes it harder to get
back performance diagnostics.

Best,
Rogan

On Tue, Aug 1, 2017 at 3:20 PM, Julia Wang (QIUHE) <
Qiuhe.Wang@microsoft.com.invalid> wrote:

> 1. The error like below can be ignored. This is because .Net code cannot
> parse Java config, it will then retry to use alias.
> ERROR: ExceptionThrowing TangApplicationException Encountered error
> [Org.Apache.REEF.Tang.Exceptions.TangApplicationException: Not able to
> get Type from the name provided: org.apache.reef.runtime.
> common.evaluator.parameters.ApplicationIdentifier]
>
> 2. Not sure about UnauthorizedAccessException, it is trying to get
> PerformanceCounter. It is a warning not error. Does it impact the running
> result?
>
> 3. "Unable to write data to the transport connection: An existing
> connection was forcibly closed by the remote host." This is an error we
> always see in the log after driver and evaluator disconnected. If driver
> shut down and tasks are completed, it is normal. But the error I see below
> has some exception in CompletedTask event. Looks like something is wrong.
>
> Julia
>
> -----Original Message-----
> From: Stephen Weller [mailto:sweller@microsoft.com.INVALID]
> Sent: Tuesday, August 1, 2017 2:42 PM
> To: dev@reef.apache.org
> Cc: Doug Service <dougse@microsoft.com>
> Subject: Help on reef error messages from log output
>
> In attempting to run our reef application in 'yarn' mode on our HDI
> cluster we are getting some exceptions that seem strange. Can anyone help
> debug these or suggest what we should check on our end?
>
>
> 1).  At the start of the output from the worker node, we are seeing some
> TangApplication exceptions like these:
>
> Container: container_1501218565459_0005_01_000004 on
> workernode0.reefhdijulia1.g10.internal.cloudapp.net_45454
> ============================================================
> ====================================================
> LogType:evaluator.stderr
> Log Upload Time:Mon Jul 31 22:08:42 +0000 2017
> LogLength:0
> Log Contents:
> End of LogType:evaluator.stderr
>
> LogType:evaluator.stdout
> Log Upload Time:Mon Jul 31 22:08:42 +0000 2017
> LogLength:37013
> Log Contents:
> Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 :
> 2017-07-31T22:08:26.7020343+00:00 0001
> ERROR: ExceptionThrowing TangApplicationException Encountered error
> [Org.Apache.REEF.Tang.Exceptions.TangApplicationException: Not able to
> get Type from the name provided: org.apache.reef.runtime.
> common.evaluator.parameters.ApplicationIdentifier]
> Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 :
> 2017-07-31T22:08:26.7176597+00:00 0001
> ERROR: ExceptionThrowing TangApplicationException Encountered error
> [Org.Apache.REEF.Tang.Exceptions.TangApplicationException: Not able to
> get Type from the name provided: org.apache.reef.runtime.
> common.evaluator.parameters.DriverRemoteIdentifier]
> Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 :
> 2017-07-31T22:08:26.7176597+00:00 0001
> ERROR: ExceptionThrowing TangApplicationException Encountered error
> [Org.Apache.REEF.Tang.Exceptions.TangApplicationException: Not able to
> get Type from the name provided: org.apache.reef.runtime.
> common.evaluator.parameters.EvaluatorConfiguration]
> Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 :
> 2017-07-31T22:08:26.7176597+00:00 0001
> ERROR: ExceptionThrowing TangApplicationException Encountered error
> [Org.Apache.REEF.Tang.Exceptions.TangApplicationException: Not able to
> get Type from the name provided: org.apache.reef.runtime.
> common.evaluator.parameters.EvaluatorIdentifier]
>
>
> Further down is this exception:
> WARNING: ExceptionCaught UnauthorizedAccessException Cannot obtain machine
> status due to error Encountered error [System.UnauthorizedAccessException:
> Access to the registry key 'Global' is denied.
>    at Microsoft.Win32.RegistryKey.Win32Error(Int32 errorCode, String str)
>    at Microsoft.Win32.RegistryKey.InternalGetValue(String name, Object
> defaultValue, Boolean doNotExpand, Boolean checkSecurity)
>    at Microsoft.Win32.RegistryKey.GetValue(String name)
>    at System.Diagnostics.PerformanceMonitor.GetData(String item)
>    at System.Diagnostics.PerformanceCounterLib.GetPerformanceData(String
> item)
>    at System.Diagnostics.PerformanceCounterLib.get_CategoryTable()
>    at System.Diagnostics.PerformanceCounterLib.CounterExists(String
> category, String counter, Boolean& categoryExists)
>    at System.Diagnostics.PerformanceCounterLib.CounterExists(String
> machine, String category, String counter)
>    at System.Diagnostics.PerformanceCounter.InitializeImpl()
>    at System.Diagnostics.PerformanceCounter.NextSample()
>    at System.Diagnostics.PerformanceCounter.NextValue()
>    at Org.Apache.REEF.Common.Runtime.MachineStatus.get_
> CurrentNodeCpuUsage()
>    at Org.Apache.REEF.Common.Runtime.MachineStatus.ToString()]
> Org.Apache.REEF.Common.Runtime.Evaluator.HeartBeatManager Stop: 0 :
> 2017-07-31T22:08:27.1551796+00:00 0001
> EXIT: 7/31/2017 10:08:27 PM HeartBeatManager::HeartBeatManager. Duration:
> [00:00:00.0192656].
>
>
> We are running the reef application as a superuser on the cluster with
> full admin privileges...
>
> Any thoughts on why we are seeing these errors?
>
>
>
> 2).    We are also getting a severe exception returned from the Bridge by
> the CLR:
>
>                 ul 31, 2017 10:08:34 PM org.apache.reef.wake.remote.
> transport.netty.AbstractNettyEventListener exceptionCaught
> WARNING: ExceptionEvent: local: /10.2.0.8:9769 remote: /10.2.0.8:53595 ::
> java.io.IOException: An existing connection was forcibly closed by the
> remote host Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.
> generic.JobDriver$CompletedTaskHandler onNext
> INFO: Completed task: SpinTask
> Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.generic.JobDriver$CompletedTaskHandler
> onNext
> INFO: Return results to the client:
> ReturnValue
> Jul 31, 2017 10:08:36 PM org.apache.reef.runtime.common.driver.client.LoggingJobStatusHandler
> onNext
> INFO: In-process JobStatus:
> identifier: "Fluid"
> state: RUNNING
> message: "\254\355\000\005t\000\vReturnValue"
>
> Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.generic.JobDriver$CompletedTaskHandler
> onNext
> INFO: CLR CompletedTaskHandler handler set, handling things with CLR
> handler.
> Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.NativeBridge onError
> SEVERE: Bridge received error from CLR: Exception in
> Call_ClrSystemCompletedTask_OnNext
> Unable to write data to the transport connection: An existing connection
> was forcibly closed by the remote host.
>    at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset,
> Int32 size)
>    at Org.Apache.REEF.Wake.Remote.Impl.Channel.Write(Byte[] message)
>    at Org.Apache.REEF.Wake.Remote.Impl.Link`1.Write(T value)
>    at Org.Apache.REEF.Wake.Remote.Impl.TransportClient`1.Send(T message)
>    at Org.Apache.REEF.Wake.Remote.Impl.DefaultRemoteManager`1.ProxyObserver.OnNext(T
> message)
>    at Org.Apache.REEF.Network.NetworkService.NsConnection`1.Write(T
> message)
>    at Org.Apache.REEF.Fluid.Network.MessageService.Send(Object message)
>    at Org.Apache.REEF.Fluid.DriverHandler.OnNext(ICompletedTask value)
>    at Org.Apache.REEF.Driver.Bridge.ClrSystemHandler`1.OnNext(T value)
>    at Org.Apache.REEF.Driver.Bridge.ClrSystemHandlerWrapper.Call_
> ClrSystemCompletedTask_OnNext(UInt64 handle, ICompletedTaskClr2Java
> clr2Java)
>    at Java_org_apache_reef_javabridge_NativeInterop_
> clrSystemCompletedTaskHandlerOnNext(JNIEnv_* env, _jclass* cls, Int64
> handler, _jobject* jcompletedTask, _jobject* jlogger) Inner Exception:
> An existing connection was forcibly closed by the remote host
>    at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset,
> Int32 size) Inner Exception: null Jul 31, 2017 10:08:36 PM
> org.apache.reef.runtime.common.driver.DriverStatusManager onError
> WARNING: Shutting down the Driver with an exception:
> java.lang.RuntimeException: Bridge received error from CLR: Exception in
> Call_ClrSystemCompletedTask_OnNext
> Unable to write data to the transport connection: An existing connection
> was forcibly closed by the remote host.
>    at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset,
> Int32 size)
>    at Org.Apache.REEF.Wake.Remote.Impl.Channel.Write(Byte[] message)
>    at Org.Apache.REEF.Wake.Remote.Impl.Link`1.Write(T value)
>    at Org.Apache.REEF.Wake.Remote.Impl.TransportClient`1.Send(T message)
>    at Org.Apache.REEF.Wake.Remote.Impl.DefaultRemoteManager`1.ProxyObserver.OnNext(T
> message)
>    at Org.Apache.REEF.Network.NetworkService.NsConnection`1.Write(T
> message)
>    at Org.Apache.REEF.Fluid.Network.MessageService.Send(Object message)
>    at Org.Apache.REEF.Fluid.DriverHandler.OnNext(ICompletedTask value)
>    at Org.Apache.REEF.Driver.Bridge.ClrSystemHandler`1.OnNext(T value)
>    at Org.Apache.REEF.Driver.Bridge.ClrSystemHandlerWrapper.Call_
> ClrSystemCompletedTask_OnNext(UInt64 handle, ICompletedTaskClr2Java
> clr2Java)
>    at Java_org_apache_reef_javabridge_NativeInterop_
> clrSystemCompletedTaskHandlerOnNext(JNIEnv_* env, _jclass* cls, Int64
> handler, _jobject* jcompletedTask, _jobject* jlogger) Inner Exception:
> An existing connection was forcibly closed by the remote host
>    at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset,
> Int32 size) Inner Exception: null
>                 at org.apache.reef.javabridge.NativeBridge.onError(
> NativeBridge.java:36)
>                 at org.apache.reef.javabridge.NativeInterop.
> clrSystemCompletedTaskHandlerOnNext(Native Method)
>                 at org.apache.reef.javabridge.generic.JobDriver$
> CompletedTaskHandler.onNext(JobDriver.java:397)
>                 at org.apache.reef.javabridge.generic.JobDriver$
> CompletedTaskHandler.onNext(JobDriver.java:378)
>                 at org.apache.reef.runtime.common.utils.
> BroadCastEventHandler.onNext(BroadCastEventHandler.java:40)
>                 at org.apache.reef.util.ExceptionHandlingEventHandler.
> onNext(ExceptionHandlingEventHandler.java:46)
>                 at org.apache.reef.runtime.common.utils.
> DispatchingEStage$1.onNext(DispatchingEStage.java:72)
>                 at org.apache.reef.runtime.common.utils.
> DispatchingEStage$1.onNext(DispatchingEStage.java:69)
>                 at org.apache.reef.wake.impl.ThreadPoolStage$1.run(
> ThreadPoolStage.java:182)
>                 at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
>                 at java.util.concurrent.FutureTask.run(FutureTask.
> java:262)
>                 at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>                 at java.lang.Thread.run(Thread.java:745)
>
>
>
> Any pointers you can provide are appreciated as always...
>
>
> Thanks!
>
>
> Stephen Weller
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message