Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A2056200CD7 for ; Tue, 1 Aug 2017 23:42:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A07A31680D5; Tue, 1 Aug 2017 21:42:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 96B711680D3 for ; Tue, 1 Aug 2017 23:42:45 +0200 (CEST) Received: (qmail 75793 invoked by uid 500); 1 Aug 2017 21:42:44 -0000 Mailing-List: contact dev-help@reef.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@reef.apache.org Delivered-To: mailing list dev@reef.apache.org Received: (qmail 75777 invoked by uid 99); 1 Aug 2017 21:42:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Aug 2017 21:42:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 084EC180236 for ; Tue, 1 Aug 2017 21:42:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.878 X-Spam-Level: * X-Spam-Status: No, score=1.878 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=microsoft.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id GyYYvaunEa1A for ; Tue, 1 Aug 2017 21:42:37 +0000 (UTC) Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0092.outbound.protection.outlook.com [104.47.40.92]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 87ECC5F306 for ; Tue, 1 Aug 2017 21:42:36 +0000 (UTC) Received: from MWHPR21MB0640.namprd21.prod.outlook.com (10.175.141.141) by MWHPR21MB0189.namprd21.prod.outlook.com (10.173.52.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1341.0; Tue, 1 Aug 2017 21:42:27 +0000 Received: from MWHPR21MB0640.namprd21.prod.outlook.com ([10.175.141.141]) by MWHPR21MB0640.namprd21.prod.outlook.com ([10.175.141.141]) with mapi id 15.01.1341.000; Tue, 1 Aug 2017 21:42:27 +0000 From: Stephen Weller To: "dev@reef.apache.org" CC: Doug Service Subject: Help on reef error messages from log output Thread-Topic: Help on reef error messages from log output Thread-Index: AdMLDP3rl3BO8OoJQbKGEBGMU2EBmw== Date: Tue, 1 Aug 2017 21:42:27 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=True; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Ref=https://api.informationprotection.azure.com/api/72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Owner=sweller@microsoft.com; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2017-08-01T14:42:10.6868813-07:00; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=General; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Application=Microsoft Azure Information Protection; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Extended_MSFT_Method=Automatic; Sensitivity=General x-originating-ip: [2001:4898:80e8:4::309] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;MWHPR21MB0189;6:iwFCFYVfkUDk6MgJFdXNu24gAmRDNno3WABbuydpL1NZo7DjwdfM0jR9Tpzpb4QxYne+92IPyOinwMOIO8lFFkibF+sRjEfMF7iTGjvRyiNUmarPydfjTs3bn20w2TSOTzTU6cqpSUqd6+drhGW5mSh8CuhGEyfX/Kv4cwsG5Qi427hu573oLHNPgd/GjdM1UJDaLhE/LQLb/kmH8SnlCFQwmLGxKqLOgQLpY1O5zHaqLmF38aEwZGdow5A6j09eY9yZOnhn8MBfvFEysxpyAdBut6OvyVHAUQgXkgO4ceIvZYjCPBz7OIm78tRhZ/ku4uibANHJGNQyyAH2yQERHA==;5:7iHhuYK0JIJhzonC1ZCs5cpmyVM0pkhU2Gsw2GNRBG6UwJil/qbI3/Ot/2WWTQnMvVmmRcJ4uffMP6vJEL7bodhGOB60+sF+pxZ+EZW8EFjDaDTwRgyOBsd1+6MMYZp294Jdk1aCJG6Rkmno67WmeA==;24:WeLjVB02C2nW/WuCaZVFsEvb3SKcc7fNaAd+WmlVzlKB0oLwfd9JiKxzuBrhJtF/8FfmYUpcCbsjnv/a31RyB4pYSh3llS7HSbPgPIdxld8=;7:pMnxRl91RWrbF4EZVzjvk4/juUgthz3Pyf4Fy6xw5Ufw5FJif+zMt4DcbnO2LVtJkMHw/kvqFfwq8mDfHfhjaPX2LwGdjPUp76qC+oFCpjDW/bRGq4tl6HPTlozgP7/2Z5iPEjoRmmoHB2v4SXRHB5sZ1nNS1VjHBX5JXDv59OF4N0EYvtP/gKASNLPmyGcDUpsE96SI9YKSGcrOuW9RqmFwBlxXkhI6ngn0l81VdLM= x-ms-office365-filtering-correlation-id: c9b37375-d3be-4e1c-567f-08d4d9263477 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:MWHPR21MB0189; x-ms-traffictypediagnostic: MWHPR21MB0189: x-exchange-antispam-report-test: UriScan:(21748063052155); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(61425038)(6040450)(601004)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(3002001)(93006095)(93001095)(10201501046)(6055026)(61426038)(61427038)(6041248)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123564025)(20161123558100)(20161123560025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:MWHPR21MB0189;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:MWHPR21MB0189; x-forefront-prvs: 0386B406AA x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39860400002)(39850400002)(39400400002)(39410400002)(39450400003)(39840400002)(47760400005)(199003)(189002)(377454003)(20264003)(10290500003)(189998001)(2906002)(33656002)(2420400007)(5660300001)(86362001)(7110500001)(3280700002)(54896002)(99286003)(7696004)(6436002)(7736002)(6306002)(9686003)(2501003)(4326008)(77096006)(8676002)(97736004)(3660700001)(2351001)(6506006)(81166006)(1730700003)(81156014)(6916009)(15650500001)(14454004)(86612001)(8990500004)(74316002)(101416001)(105586002)(790700001)(8936002)(110136004)(106356001)(107886003)(38730400002)(2900100001)(5640700003)(68736007)(5005710100001)(10090500001)(55016002)(10710500007)(53936002)(478600001)(25786009)(6116002)(102836003)(54356999)(50986999)(19627235001);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR21MB0189;H:MWHPR21MB0640.namprd21.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts) authentication-results: spf=none (sender IP is ) smtp.mailfrom=sweller@microsoft.com; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_MWHPR21MB06405EB7ADB1989127631C6AB7B30MWHPR21MB0640namp_" MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Aug 2017 21:42:27.6832 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR21MB0189 archived-at: Tue, 01 Aug 2017 21:42:46 -0000 --_000_MWHPR21MB06405EB7ADB1989127631C6AB7B30MWHPR21MB0640namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable In attempting to run our reef application in 'yarn' mode on our HDI cluster= we are getting some exceptions that seem strange. Can anyone help debug these or suggest what we should ch= eck on our end? 1). At the start of the output from the worker node, we are seeing some Ta= ngApplication exceptions like these: Container: container_1501218565459_0005_01_000004 on workernode0.reefhdijul= ia1.g10.internal.cloudapp.net_45454 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D LogType:evaluator.stderr Log Upload Time:Mon Jul 31 22:08:42 +0000 2017 LogLength:0 Log Contents: End of LogType:evaluator.stderr LogType:evaluator.stdout Log Upload Time:Mon Jul 31 22:08:42 +0000 2017 LogLength:37013 Log Contents: Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 : 2017-07-31T22:08:26.702= 0343+00:00 0001 ERROR: ExceptionThrowing TangApplicationException Encountered error [Org.Apache.REEF.Tang.Exceptions.TangApplicationException= : Not able to get Type from the name provided: org.apache.reef.runtime.comm= on.evaluator.parameters.ApplicationIdentifier] Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 : 2017-07-31T22:08:26.717= 6597+00:00 0001 ERROR: ExceptionThrowing TangApplicationException Encountered error [Org.Apache.REEF.Tang.Exceptions.TangApplicationException= : Not able to get Type from the name provided: org.apache.reef.runtime.comm= on.evaluator.parameters.DriverRemoteIdentifier] Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 : 2017-07-31T22:08:26.717= 6597+00:00 0001 ERROR: ExceptionThrowing TangApplicationException Encountered error [Org.Apache.REEF.Tang.Exceptions.TangApplicationException= : Not able to get Type from the name provided: org.apache.reef.runtime.comm= on.evaluator.parameters.EvaluatorConfiguration] Org.Apache.REEF.Tang.Util.AssemblyLoader Error: 0 : 2017-07-31T22:08:26.717= 6597+00:00 0001 ERROR: ExceptionThrowing TangApplicationException Encountered error [Org.Apache.REEF.Tang.Exceptions.TangApplicationException= : Not able to get Type from the name provided: org.apache.reef.runtime.comm= on.evaluator.parameters.EvaluatorIdentifier] Further down is this exception: WARNING: ExceptionCaught UnauthorizedAccessException Cannot obtain machine = status due to error Encountered error [System.UnauthorizedAccessException: Access to the regist= ry key 'Global' is denied. at Microsoft.Win32.RegistryKey.Win32Error(Int32 errorCode, String str) at Microsoft.Win32.RegistryKey.InternalGetValue(String name, Object defa= ultValue, Boolean doNotExpand, Boolean checkSecurity) at Microsoft.Win32.RegistryKey.GetValue(String name) at System.Diagnostics.PerformanceMonitor.GetData(String item) at System.Diagnostics.PerformanceCounterLib.GetPerformanceData(String it= em) at System.Diagnostics.PerformanceCounterLib.get_CategoryTable() at System.Diagnostics.PerformanceCounterLib.CounterExists(String categor= y, String counter, Boolean& categoryExists) at System.Diagnostics.PerformanceCounterLib.CounterExists(String machine= , String category, String counter) at System.Diagnostics.PerformanceCounter.InitializeImpl() at System.Diagnostics.PerformanceCounter.NextSample() at System.Diagnostics.PerformanceCounter.NextValue() at Org.Apache.REEF.Common.Runtime.MachineStatus.get_CurrentNodeCpuUsage(= ) at Org.Apache.REEF.Common.Runtime.MachineStatus.ToString()] Org.Apache.REEF.Common.Runtime.Evaluator.HeartBeatManager Stop: 0 : 2017-07= -31T22:08:27.1551796+00:00 0001 EXIT: 7/31/2017 10:08:27 PM HeartBeatManager::HeartBeatManager. Duration: [= 00:00:00.0192656]. We are running the reef application as a superuser on the cluster with full= admin privileges... Any thoughts on why we are seeing these errors? 2). We are also getting a severe exception returned from the Bridge by t= he CLR: ul 31, 2017 10:08:34 PM org.apache.reef.wake.remote.transpo= rt.netty.AbstractNettyEventListener exceptionCaught WARNING: ExceptionEvent: local: /10.2.0.8:9769 remote: /10.2.0.8:53595 :: j= ava.io.IOException: An existing connection was forcibly closed by the remot= e host Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.generic.JobDriver$Compl= etedTaskHandler onNext INFO: Completed task: SpinTask Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.generic.JobDriver$Compl= etedTaskHandler onNext INFO: Return results to the client: ReturnValue Jul 31, 2017 10:08:36 PM org.apache.reef.runtime.common.driver.client.Loggi= ngJobStatusHandler onNext INFO: In-process JobStatus: identifier: "Fluid" state: RUNNING message: "\254\355\000\005t\000\vReturnValue" Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.generic.JobDriver$Compl= etedTaskHandler onNext INFO: CLR CompletedTaskHandler handler set, handling things with CLR handle= r. Jul 31, 2017 10:08:36 PM org.apache.reef.javabridge.NativeBridge onError SEVERE: Bridge received error from CLR: Exception in Call_ClrSystemComplete= dTask_OnNext Unable to write data to the transport connection: An existing connection wa= s forcibly closed by the remote host. at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, I= nt32 size) at Org.Apache.REEF.Wake.Remote.Impl.Channel.Write(Byte[] message) at Org.Apache.REEF.Wake.Remote.Impl.Link`1.Write(T value) at Org.Apache.REEF.Wake.Remote.Impl.TransportClient`1.Send(T message) at Org.Apache.REEF.Wake.Remote.Impl.DefaultRemoteManager`1.ProxyObserver= .OnNext(T message) at Org.Apache.REEF.Network.NetworkService.NsConnection`1.Write(T message= ) at Org.Apache.REEF.Fluid.Network.MessageService.Send(Object message) at Org.Apache.REEF.Fluid.DriverHandler.OnNext(ICompletedTask value) at Org.Apache.REEF.Driver.Bridge.ClrSystemHandler`1.OnNext(T value) at Org.Apache.REEF.Driver.Bridge.ClrSystemHandlerWrapper.Call_ClrSystemC= ompletedTask_OnNext(UInt64 handle, ICompletedTaskClr2Java clr2Java) at Java_org_apache_reef_javabridge_NativeInterop_clrSystemCompletedTaskH= andlerOnNext(JNIEnv_* env, _jclass* cls, Int64 handler, _jobject* jcomplete= dTask, _jobject* jlogger) Inner Exception: An existing connection was forcibly closed by the remote host at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, I= nt32 size) Inner Exception: null Jul 31, 2017 10:08:36 PM org.apache.reef.runtime.common.driver.DriverStatus= Manager onError WARNING: Shutting down the Driver with an exception: java.lang.RuntimeException: Bridge received error from CLR: Exception in Ca= ll_ClrSystemCompletedTask_OnNext Unable to write data to the transport connection: An existing connection wa= s forcibly closed by the remote host. at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, I= nt32 size) at Org.Apache.REEF.Wake.Remote.Impl.Channel.Write(Byte[] message) at Org.Apache.REEF.Wake.Remote.Impl.Link`1.Write(T value) at Org.Apache.REEF.Wake.Remote.Impl.TransportClient`1.Send(T message) at Org.Apache.REEF.Wake.Remote.Impl.DefaultRemoteManager`1.ProxyObserver= .OnNext(T message) at Org.Apache.REEF.Network.NetworkService.NsConnection`1.Write(T message= ) at Org.Apache.REEF.Fluid.Network.MessageService.Send(Object message) at Org.Apache.REEF.Fluid.DriverHandler.OnNext(ICompletedTask value) at Org.Apache.REEF.Driver.Bridge.ClrSystemHandler`1.OnNext(T value) at Org.Apache.REEF.Driver.Bridge.ClrSystemHandlerWrapper.Call_ClrSystemC= ompletedTask_OnNext(UInt64 handle, ICompletedTaskClr2Java clr2Java) at Java_org_apache_reef_javabridge_NativeInterop_clrSystemCompletedTaskH= andlerOnNext(JNIEnv_* env, _jclass* cls, Int64 handler, _jobject* jcomplete= dTask, _jobject* jlogger) Inner Exception: An existing connection was forcibly closed by the remote host at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, I= nt32 size) Inner Exception: null at org.apache.reef.javabridge.NativeBridge.onError(NativeBr= idge.java:36) at org.apache.reef.javabridge.NativeInterop.clrSystemComple= tedTaskHandlerOnNext(Native Method) at org.apache.reef.javabridge.generic.JobDriver$CompletedTa= skHandler.onNext(JobDriver.java:397) at org.apache.reef.javabridge.generic.JobDriver$CompletedTa= skHandler.onNext(JobDriver.java:378) at org.apache.reef.runtime.common.utils.BroadCastEventHandl= er.onNext(BroadCastEventHandler.java:40) at org.apache.reef.util.ExceptionHandlingEventHandler.onNex= t(ExceptionHandlingEventHandler.java:46) at org.apache.reef.runtime.common.utils.DispatchingEStage$1= .onNext(DispatchingEStage.java:72) at org.apache.reef.runtime.common.utils.DispatchingEStage$1= .onNext(DispatchingEStage.java:69) at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPo= olStage.java:182) at java.util.concurrent.Executors$RunnableAdapter.call(Exec= utors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(Thread= PoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa= dPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Any pointers you can provide are appreciated as always... Thanks! Stephen Weller --_000_MWHPR21MB06405EB7ADB1989127631C6AB7B30MWHPR21MB0640namp_--