Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0FB2518C63 for ; Tue, 29 Sep 2015 20:46:47 +0000 (UTC) Received: (qmail 61409 invoked by uid 500); 29 Sep 2015 20:46:47 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 61372 invoked by uid 500); 29 Sep 2015 20:46:46 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 61362 invoked by uid 99); 29 Sep 2015 20:46:46 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Sep 2015 20:46:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 40E45C7254 for ; Tue, 29 Sep 2015 20:46:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id YZX1lZL1za5i for ; Tue, 29 Sep 2015 20:46:34 +0000 (UTC) Received: from mail-yk0-f173.google.com (mail-yk0-f173.google.com [209.85.160.173]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id C3EB4204DF for ; Tue, 29 Sep 2015 20:46:33 +0000 (UTC) Received: by ykdt18 with SMTP id t18so19684411ykd.3 for ; Tue, 29 Sep 2015 13:46:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=tRa/TvQE7NYttFQKQKR8iAqlZJuxnOlGOveySU2M7jQ=; b=glYhgCbtJm3V6rE04Edz4l0kKneGBuRVQNTUggRLsjQEvJXs94oS7o8urp/VyKjxZe F+KcqTLO6Ez+KYvlEvmgYqulARfUlfEDrQXdoFBRY/vYxwZ9vjqUTA4XlmTLcJsLPNEg ToJeWQMsaOnliCLcVsS9+BK5rKdSqpf4IJ2n3/6EMLLpakALehciyudCjdZ9cj5IZj20 nqdw7PBtFG+3p2JSicBc8AcD09kfSxV6DQ6xaLsbrP32MU8iGZiYn1O1UHjlNeTzajTo aL1ZwHXg6oq7JlcHj5v7zFj1firjlY6hkRsSgSjkTWVuxBjHKVSFN+HksUq8Xz6Iqlv+ 4/7w== X-Gm-Message-State: ALoCoQlktONmMmvAuoTAEzpWZUaUSuwucCFar1AkIOC3EvoVBxi7zqlqboKnKZeaCezDNyZyAbng X-Received: by 10.170.131.200 with SMTP id x191mr23317582ykb.97.1443559592606; Tue, 29 Sep 2015 13:46:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.220.7 with HTTP; Tue, 29 Sep 2015 13:46:13 -0700 (PDT) In-Reply-To: References: From: Josh Wills Date: Tue, 29 Sep 2015 16:46:13 -0400 Message-ID: Subject: Re: LeaseExpiredExceptions and temp side effect files To: user@crunch.apache.org Cc: Jeff Quinn , Rahul Gupta-Iwasaki Content-Type: multipart/alternative; boundary=001a113a7caabafbea0520e8e872 --001a113a7caabafbea0520e8e872 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yeah, that makes sense to me-- not totally trivial to do, but it should be possible. J On Tue, Sep 29, 2015 at 4:42 PM, Everett Anderson wrote: > Hey, > > We have some leads. Increasing the datanode memory seems to help the > immediate issue. > > However, we need a solution to our buildup of temporary outputs. We're > exploring segmenting our pipeline with run()/cleanup() calls. > > I'm curious, though -- > > Do you think it'd be possible for us to make a Crunch modification to > optionally actively cleanup temporary outputs? It seems like the planner > would know what those are. > > A temporary output would be any PCollection that isn't referenced by > outside of Crunch (or perhaps ones that aren't explicitly marked as cache= d). > > > On Thu, Sep 24, 2015 at 5:46 PM, Josh Wills wrote: > >> Hrm. If you never call Pipeline.done, you should never cleanup the >> temporary files for the job... >> >> On Thu, Sep 24, 2015 at 5:44 PM, Everett Anderson >> wrote: >> >>> While we tried to take comfort in the fact that we'd only seen this onl= y >>> HD-based cc2.8xlarges, I'm afraid we're now seeing it when processing >>> larger amounts of data on SSD-based c3.4x8larges. >>> >>> My two hypotheses are >>> >>> 1) Somehow these temp files are getting cleaned up before they're >>> accessed for the last time. Perhaps either something in HDFS or Hadoop >>> cleans up these temp directories, or perhaps there's a bunch in Crunch'= s >>> planner. >>> >>> 2) HDFS has chosen 3 machines to replicate data to, but it is performin= g >>> a very lopsided replication. While the cluster overall looks like it ha= s >>> HDFS capacity, perhaps a small subset of the machines is actually at >>> capacity. Things seem to fail in obscure ways when running out of disk. >>> >>> >>> 2015-09-24 23:28:58,850 WARN [main] org.apache.hadoop.mapred.YarnChild:= Exception running child : org.apache.crunch.CrunchRuntimeException: Could = not read runtime node information >>> at org.apache.crunch.impl.mr.run.CrunchTaskContext.(CrunchTaskCo= ntext.java:48) >>> at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.jav= a:40) >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:172) >>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:6= 56) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor= mation.java:1548) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) >>> Caused by: java.io.FileNotFoundException: File does not exist: /tmp/cru= nch-2031291770/p567/REDUCE >>> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.= java:65) >>> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.= java:55) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= nsUpdateTimes(FSNamesystem.java:1726) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= nsInt(FSNamesystem.java:1669) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= ns(FSNamesystem.java:1649) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= ns(FSNamesystem.java:1621) >>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLo= cations(NameNodeRpcServer.java:497) >>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideT= ranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.= java:322) >>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$= ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.ja= va) >>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.c= all(ProtobufRpcEngine.java:599) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor= mation.java:1548) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Metho= d) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstru= ctorAccessorImpl.java:57) >>> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delegatin= gConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteEx= ception.java:106) >>> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteE= xception.java:73) >>> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.ja= va:1147) >>> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:11= 35) >>> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:11= 25) >>> at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastB= lockLength(DFSInputStream.java:273) >>> at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:= 240) >>> at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:23= 3) >>> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1298) >>> at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFi= leSystem.java:300) >>> at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFi= leSystem.java:296) >>> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkR= esolver.java:81) >>> at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSy= stem.java:296) >>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) >>> at org.apache.crunch.util.DistCache.read(DistCache.java:72) >>> at org.apache.crunch.impl.mr.run.CrunchTaskContext.(CrunchTaskCo= ntext.java:46) >>> ... 9 more >>> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundEx= ception): File does not exist: /tmp/crunch-2031291770/p567/REDUCE >>> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.= java:65) >>> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.= java:55) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= nsUpdateTimes(FSNamesystem.java:1726) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= nsInt(FSNamesystem.java:1669) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= ns(FSNamesystem.java:1649) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocatio= ns(FSNamesystem.java:1621) >>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLo= cations(NameNodeRpcServer.java:497) >>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideT= ranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.= java:322) >>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$= ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.ja= va) >>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.c= all(ProtobufRpcEngine.java:599) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor= mation.java:1548) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:1410) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1363) >>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcE= ngine.java:215) >>> at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp= l.java:57) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc= essorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Retr= yInvocationHandler.java:190) >>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvoc= ationHandler.java:103) >>> at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) >>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorP= B.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:219) >>> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.ja= va:1145) >>> ... 22 more >>> >>> >>> On Fri, Aug 21, 2015 at 3:52 PM, Jeff Quinn wrote: >>> >>>> Also worth noting, we inspected the hadoop configuration defaults that >>>> the AWS EMR service populates for the two different instance types, fo= r >>>> mapred-site.xml, core-site.xml, and hdfs-site.xml all settings were >>>> identical, with the exception of slight differences in JVM memory allo= tted. >>>> Further investigated the max number of file descriptors for each insta= nce >>>> type via ulimit, and saw no differences there either. >>>> >>>> So not sure what the main difference is between these two clusters tha= t >>>> would cause these very different outcomes, other than cc2.8xlarge havi= ng >>>> SSDs and c3.8xlarge having spinning disks. >>>> >>>> On Fri, Aug 21, 2015 at 1:03 PM, Everett Anderson >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> Jeff graciously agreed to try it out. >>>>> >>>>> I'm afraid we're still getting failures on that instance type, though >>>>> with 0.11 with the patches, the cluster ended up in a state that no n= ew >>>>> applications could be submitted afterwards. >>>>> >>>>> The errors when running the pipeline seem to be similarly HDFS >>>>> related. It's quite odd. >>>>> >>>>> Examples when using 0.11 + the patches: >>>>> >>>>> >>>>> 2015-08-20 23:17:50,455 WARN [Thread-38] >>>>> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Sour= ce >>>>> file >>>>> "/tmp/crunch-274499863/p504/output/_temporary/1/_temporary/attempt_14= 40102643297_out0_0107_r_000001_0/out0-r-00001" >>>>> - Aborting... >>>>> >>>>> >>>>> 2015-08-20 22:39:42,184 WARN [Thread-51] >>>>> org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception >>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.n= amenode.LeaseExpiredException): >>>>> No lease on >>>>> /tmp/crunch-274499863/p510/output/_temporary/1/_temporary/attempt_144= 0102643297_out12_0103_r_000167_2/out12-r-00167 >>>>> (inode 83784): File does not exist. [Lease. Holder: >>>>> DFSClient_attempt_1440102643297_0103_r_000167_2_964529009_1, >>>>> pendingcreates: 24] >>>>> at >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSName= system.java:3516) >>>>> at >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.abandonBlock(FSNa= mesystem.java:3486) >>>>> at >>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.abandonBlock= (NameNodeRpcServer.java:687) >>>>> at >>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTra= nslatorPB.abandonBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:46= 7) >>>>> at >>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cl= ientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java= ) >>>>> at >>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.cal= l(ProtobufRpcEngine.java:635) >>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) >>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) >>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma= tion.java:1628) >>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) >>>>> >>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1468) >>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1399) >>>>> at >>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEng= ine.java:241) >>>>> at com.sun.proxy.$Proxy13.abandonBlock(Unknown Source) >>>>> at >>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.= abandonBlock(ClientNamenodeProtocolTranslatorPB.java:376) >>>>> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces= sorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>> at >>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI= nvocationHandler.java:187) >>>>> at >>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat= ionHandler.java:102) >>>>> at com.sun.proxy.$Proxy14.abandonBlock(Unknown Source) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputSt= ream(DFSOutputStream.java:1377) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStre= am.java:594) >>>>> 2015-08-20 22:39:42,184 WARN [Thread-51] >>>>> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Sour= ce >>>>> file >>>>> "/tmp/crunch-274499863/p510/output/_temporary/1/_temporary/attempt_14= 40102643297_out12_0103_r_000167_2/out12-r-00167" >>>>> - Aborting... >>>>> >>>>> >>>>> >>>>> 2015-08-20 23:34:59,276 INFO [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStrea= m >>>>> java.io.IOException: Bad connect ack with firstBadLink as >>>>> 10.55.1.103:50010 >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutput= Stream(DFSOutputStream.java:1472) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputSt= ream(DFSOutputStream.java:1373) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStre= am.java:594) >>>>> 2015-08-20 23:34:59,276 INFO [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Abandoning >>>>> BP-835517662-10.55.1.32-1440102626965:blk_1073828261_95268 >>>>> 2015-08-20 23:34:59,278 INFO [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Excluding datanode 10.55.1.103:5001= 0 >>>>> 2015-08-20 23:34:59,278 WARN [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception >>>>> java.io.IOException: Unable to create new block. >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputSt= ream(DFSOutputStream.java:1386) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStre= am.java:594) >>>>> 2015-08-20 23:34:59,278 WARN [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Sour= ce >>>>> file >>>>> "/tmp/crunch-274499863/p504/output/_temporary/1/_temporary/attempt_14= 40102643297_out0_0107_r_000001_2/out0-r-00001" >>>>> - Aborting... >>>>> 2015-08-20 23:34:59,279 WARN [main] >>>>> org.apache.hadoop.mapred.YarnChild: Exception running child : >>>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: Bad co= nnect >>>>> ack with firstBadLink as 10.55.1.103:50010 >>>>> at >>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskCon= text.java:74) >>>>> at >>>>> org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReducer.jav= a:64) >>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) >>>>> at >>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656= ) >>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) >>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma= tion.java:1628) >>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) >>>>> Caused by: java.io.IOException: Bad connect ack with firstBadLink as >>>>> 10.55.1.103:50010 >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutput= Stream(DFSOutputStream.java:1472) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputSt= ream(DFSOutputStream.java:1373) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStre= am.java:594) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Aug 21, 2015 at 11:59 AM, Josh Wills >>>>> wrote: >>>>> >>>>>> Curious how this went. :) >>>>>> >>>>>> On Tue, Aug 18, 2015 at 4:26 PM, Everett Anderson >>>>>> wrote: >>>>>> >>>>>>> Sure, let me give it a try. I'm going to take 0.11 and patch it wit= h >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/CRUNCH-553 >>>>>>> https://issues.apache.org/jira/browse/CRUNCH-517 >>>>>>> >>>>>>> as we also rely on 517. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 18, 2015 at 4:09 PM, Josh Wills >>>>>>> wrote: >>>>>>> >>>>>>>> (In particular, I'm wondering if something in CRUNCH-481 is relate= d >>>>>>>> to this problem.) >>>>>>>> >>>>>>>> On Tue, Aug 18, 2015 at 4:07 PM, Josh Wills >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey Everett, >>>>>>>>> >>>>>>>>> Shot in the dark-- would you mind trying it w/0.11.0-hadoop2 w/th= e >>>>>>>>> 553 patch? Is that easy to do? >>>>>>>>> >>>>>>>>> J >>>>>>>>> >>>>>>>>> On Tue, Aug 18, 2015 at 3:18 PM, Everett Anderson < >>>>>>>>> everett@nuna.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I verified that the pipeline succeeds on the same cc2.8xlarge >>>>>>>>>> hardware when setting crunch.max.running.jobs to 1. I generally >>>>>>>>>> feel like the pipeline application itself logic is sound, at thi= s point. It >>>>>>>>>> could be that this is just taxing these machines too hard and we= need to >>>>>>>>>> increase the number of retries? >>>>>>>>>> >>>>>>>>>> It reliably fails on this hardware when crunch.max.running.jobs >>>>>>>>>> set to its default. >>>>>>>>>> >>>>>>>>>> Can you explain a little what the /tmp/crunch-XXXXXXX files are >>>>>>>>>> as well as how Crunch uses side effect files? Do you know if HDF= S would >>>>>>>>>> clean up those directories from underneath Crunch? >>>>>>>>>> >>>>>>>>>> There are usually 4 failed applications, failing due to reduces. >>>>>>>>>> The failures seem to be one of the following three kinds -- (1) = No lease on >>>>>>>>>> , (2) File not found fil= e, (3) >>>>>>>>>> SocketTimeoutException. >>>>>>>>>> >>>>>>>>>> Examples: >>>>>>>>>> >>>>>>>>>> [1] No lease exception >>>>>>>>>> >>>>>>>>>> Error: org.apache.crunch.CrunchRuntimeException: >>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.ser= ver.namenode.LeaseExpiredException): >>>>>>>>>> No lease on >>>>>>>>>> /tmp/crunch-4694113/p662/output/_temporary/1/_temporary/attempt_= 1439917295505_out7_0018_r_000003_1/out7-r-00003: >>>>>>>>>> File does not exist. Holder >>>>>>>>>> DFSClient_attempt_1439917295505_0018_r_000003_1_824053899_1 does= not have >>>>>>>>>> any open files. at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(F= SNamesystem.java:2944) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile= Internal(FSNamesystem.java:3008) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile= (FSNamesystem.java:2988) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complet= e(NameNodeRpcServer.java:641) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi= deTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:4= 84) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProt= os$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos= .java) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoke= r.call(ProtobufRpcEngine.java:599) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTa= skContext.java:74) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReduce= r.java:64) >>>>>>>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) at >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.jav= a:656) at >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) at >>>>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) C= aused by: >>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.ser= ver.namenode.LeaseExpiredException): >>>>>>>>>> No lease on >>>>>>>>>> /tmp/crunch-4694113/p662/output/_temporary/1/_temporary/attempt_= 1439917295505_out7_0018_r_000003_1/out7-r-00003: >>>>>>>>>> File does not exist. Holder >>>>>>>>>> DFSClient_attempt_1439917295505_0018_r_000003_1_824053899_1 does= not have >>>>>>>>>> any open files. at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(F= SNamesystem.java:2944) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile= Internal(FSNamesystem.java:3008) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile= (FSNamesystem.java:2988) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complet= e(NameNodeRpcServer.java:641) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi= deTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:4= 84) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProt= os$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos= .java) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoke= r.call(ProtobufRpcEngine.java:599) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at >>>>>>>>>> org.apache.hadoop.ipc.Client.call(Client.java:1410) at >>>>>>>>>> org.apache.hadoop.ipc.Client.call(Client.java:1363) at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufR= pcEngine.java:215) >>>>>>>>>> at com.sun.proxy.$Proxy13.complete(Unknown Source) at >>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at >>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessor= Impl.java:57) >>>>>>>>>> at >>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod= AccessorImpl.java:43) >>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) at >>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(R= etryInvocationHandler.java:190) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryIn= vocationHandler.java:103) >>>>>>>>>> at com.sun.proxy.$Proxy13.complete(Unknown Source) at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslat= orPB.complete(ClientNamenodeProtocolTranslatorPB.java:404) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStr= eam.java:2130) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.= java:2114) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDa= taOutputStream.java:72) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream= .java:105) >>>>>>>>>> at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.j= ava:1289) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$= 1.close(SequenceFileOutputFormat.java:87) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.io.CrunchOutputs$OutputState.close(CrunchOutpu= ts.java:300) >>>>>>>>>> at org.apache.crunch.io.CrunchOutputs.close(CrunchOutputs.java:1= 80) at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTa= skContext.java:72) >>>>>>>>>> ... 9 more >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [2] File does not exist >>>>>>>>>> >>>>>>>>>> 2015-08-18 17:36:10,195 INFO [AsyncDispatcher event handler] org= .apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics repor= t from attempt_1439917295505_0034_r_000004_1: Error: org.apache.crunch.Crun= chRuntimeException: Could not read runtime node information >>>>>>>>>> at org.apache.crunch.impl.mr.run.CrunchTaskContext.(Crunc= hTaskContext.java:48) >>>>>>>>>> at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchRedu= cer.java:40) >>>>>>>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:172) >>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask= .java:656) >>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) >>>>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>>>>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro= upInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) >>>>>>>>>> Caused by: java.io.FileNotFoundException: File does not exist: /= tmp/crunch-4694113/p470/REDUCE >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INo= deFile.java:65) >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INo= deFile.java:55) >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock= LocationsUpdateTimes(FSNamesystem.java:1726) >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock= LocationsInt(FSNamesystem.java:1669) >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock= Locations(FSNamesystem.java:1649) >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock= Locations(FSNamesystem.java:1621) >>>>>>>>>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.get= BlockLocations(NameNodeRpcServer.java:497) >>>>>>>>>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ= erSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTransl= atorPB.java:322) >>>>>>>>>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol= Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolPr= otos.java) >>>>>>>>>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn= voker.call(ProtobufRpcEngine.java:599) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>>>>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro= upInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>>>>>> >>>>>>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Nativ= e Method) >>>>>>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native= ConstructorAccessorImpl.java:57) >>>>>>>>>> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De= legatingConstructorAccessorImpl.java:45) >>>>>>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:5= 26) >>>>>>>>>> at org.apache.hadoop.ipc.RemoteException.instantiateException(R= emoteException.java:106) >>>>>>>>>> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(= RemoteException.java:73) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSCl= ient.java:1147) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.= java:1135) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.= java:1125) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndG= etLastBlockLength(DFSInputStream.java:273) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStrea= m.java:240) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.= java:233) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1298) >>>>>>>>>> at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(Distri= butedFileSystem.java:300) >>>>>>>>>> at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(Distri= butedFileSystem.java:296) >>>>>>>>>> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst= emLinkResolver.java:81) >>>>>>>>>> at org.apache.hadoop.hdfs.DistributedFileSystem.open(Distribute= dFileSystem.java:296) >>>>>>>>>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) >>>>>>>>>> at org.apache.crunch.util.DistCache.read(DistCache.java:72) >>>>>>>>>> at org.apache.crunch.impl.mr.run.CrunchTaskContext.(Crunc= hTaskContext.java:46) >>>>>>>>>> ... 9 more >>>>>>>>>> >>>>>>>>>> [3] SocketTimeoutException >>>>>>>>>> >>>>>>>>>> Error: org.apache.crunch.CrunchRuntimeException: java.net.Socket= TimeoutException: 70000 millis timeout while waiting for channel to be read= y for read. ch : java.nio.channels.SocketChannel[connected local=3D/10.55.1= .229:35720 remote=3D/10.55.1.230:9200] at org.apache.crunch.impl.mr.run.Cru= nchTaskContext.cleanup(CrunchTaskContext.java:74) at org.apache.crunch.impl= .mr.run.CrunchReducer.cleanup(CrunchReducer.java:64) at org.apache.hadoop.m= apreduce.Reducer.run(Reducer.java:195) at org.apache.hadoop.mapred.ReduceTa= sk.runNewReducer(ReduceTask.java:656) at org.apache.hadoop.mapred.ReduceTas= k.run(ReduceTask.java:394) at org.apache.hadoop.mapred.YarnChild$2.run(Yarn= Child.java:175) at java.security.AccessController.doPrivileged(Native Metho= d) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hado= op.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at or= g.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) Caused by: java.n= et.SocketTimeoutException: 70000 millis timeout while waiting for channel t= o be ready for read. ch : java.nio.channels.SocketChannel[connected local= =3D/10.55.1.229:35720 remote=3D/10.55.1.230:9200] at org.apache.hadoop.net.= SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop= .net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoo= p.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hado= op.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.Filter= InputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.re= ad(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper= .vintPrefixed(PBHelper.java:1985) at org.apache.hadoop.hdfs.DFSOutputStream= $DataStreamer.transfer(DFSOutputStream.java:1075) at org.apache.hadoop.hdfs= .DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.= java:1042) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipe= lineForAppendOrRecovery(DFSOutputStream.java:1186) at org.apache.hadoop.hdf= s.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:93= 5) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStre= am.java:491) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 14, 2015 at 3:54 PM, Everett Anderson < >>>>>>>>>> everett@nuna.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 14, 2015 at 3:26 PM, Josh Wills >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Everett, >>>>>>>>>>>> >>>>>>>>>>>> Initial thought-- there are lots of reasons for lease expired >>>>>>>>>>>> exceptions, and their usually more symptomatic of other proble= ms in the >>>>>>>>>>>> pipeline. Are you sure none of the jobs in the Crunch pipeline= on the >>>>>>>>>>>> non-SSD instances are failing for some other reason? I'd be su= rprised if no >>>>>>>>>>>> other errors showed up in the app master, although there are r= eports of >>>>>>>>>>>> some weirdness around LeaseExpireds when writing to S3-- but y= ou're not >>>>>>>>>>>> doing that here, right? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We're reading from and writing to HDFS, here. (We've copied in >>>>>>>>>>> input from S3 to HDFS in another step.) >>>>>>>>>>> >>>>>>>>>>> There are a few exceptions in the logs. Most seem related to >>>>>>>>>>> missing temp files. >>>>>>>>>>> >>>>>>>>>>> Let me see if I can reproduce it with crunch.max.running.jobs >>>>>>>>>>> set to 1 to try to narrow down the originating failure. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> J >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 14, 2015 at 2:10 PM, Everett Anderson < >>>>>>>>>>>> everett@nuna.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I recently started trying to run our Crunch pipeline on more >>>>>>>>>>>>> data and have been trying out different AWS instance types in= anticipation >>>>>>>>>>>>> of our storage and compute needs. >>>>>>>>>>>>> >>>>>>>>>>>>> I was using EMR 3.8 (so Hadoop 2.4.0) with Crunch 0.12 >>>>>>>>>>>>> (patched with the CRUNCH-553 >>>>>>>>>>>>> fix). >>>>>>>>>>>>> >>>>>>>>>>>>> Our pipeline finishes fine in these cluster configurations: >>>>>>>>>>>>> >>>>>>>>>>>>> - 50 c3.4xlarge Core, 0 Task >>>>>>>>>>>>> - 10 c3.8xlarge Core, 0 Task >>>>>>>>>>>>> - 25 c3.8xlarge Core, 0 Task >>>>>>>>>>>>> >>>>>>>>>>>>> However, it always fails on the same data when using 10 >>>>>>>>>>>>> cc2.8xlarge Core instances. >>>>>>>>>>>>> >>>>>>>>>>>>> The biggest obvious hardware difference is that the >>>>>>>>>>>>> cc2.8xlarges use hard disks instead of SSDs. >>>>>>>>>>>>> >>>>>>>>>>>>> While it's a little hard to track down the exact originating >>>>>>>>>>>>> failure, I think it's from errors like: >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-08-13 21:34:38,379 ERROR [IPC Server handler 24 on 45711= ] >>>>>>>>>>>>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: >>>>>>>>>>>>> attempt_1439499407003_0028_r_000153_1 - exited : >>>>>>>>>>>>> org.apache.crunch.CrunchRuntimeException: >>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.= server.namenode.LeaseExpiredException): >>>>>>>>>>>>> No lease on >>>>>>>>>>>>> /tmp/crunch-970849245/p662/output/_temporary/1/_temporary/att= empt_1439499407003_out7_0028_r_000153_1/out7-r-00153: >>>>>>>>>>>>> File does not exist. Holder >>>>>>>>>>>>> DFSClient_attempt_1439499407003_0028_r_000153_1_609888542_1 d= oes not have >>>>>>>>>>>>> any open files. >>>>>>>>>>>>> >>>>>>>>>>>>> Those paths look like these side effect files >>>>>>>>>>>>> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> Would Crunch have generated applications that depend on side >>>>>>>>>>>>> effect paths as input across MapReduce applications and somet= hing in HDFS >>>>>>>>>>>>> is cleaning up those paths, unaware of the higher level depen= dencies? AWS >>>>>>>>>>>>> configures Hadoop differently for each instance type, and mig= ht have more >>>>>>>>>>>>> aggressive cleanup settings on HDs, though this is very uninf= ormed >>>>>>>>>>>>> hypothesis. >>>>>>>>>>>>> >>>>>>>>>>>>> A sample full log is attached. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for any guidance! >>>>>>>>>>>>> >>>>>>>>>>>>> - Everett >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *DISCLAIMER:* The contents of this email, including any >>>>>>>>>>>>> attachments, may contain information that is confidential, pr= oprietary in >>>>>>>>>>>>> nature, protected health information (PHI), or otherwise prot= ected by law >>>>>>>>>>>>> from disclosure, and is solely for the use of the intended re= cipient(s). If >>>>>>>>>>>>> you are not the intended recipient, you are hereby notified t= hat any use, >>>>>>>>>>>>> disclosure or copying of this email, including any attachment= s, is >>>>>>>>>>>>> unauthorized and strictly prohibited. If you have received th= is email in >>>>>>>>>>>>> error, please notify the sender of this email. Please delete = this and all >>>>>>>>>>>>> copies of this email from your system. Any opinions either ex= pressed or >>>>>>>>>>>>> implied in this email and all attachments, are those of its a= uthor only, >>>>>>>>>>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Director of Data Science >>>>>>>>>>>> Cloudera >>>>>>>>>>>> Twitter: @josh_wills >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *DISCLAIMER:* The contents of this email, including any >>>>>>>>>> attachments, may contain information that is confidential, propr= ietary in >>>>>>>>>> nature, protected health information (PHI), or otherwise protect= ed by law >>>>>>>>>> from disclosure, and is solely for the use of the intended recip= ient(s). If >>>>>>>>>> you are not the intended recipient, you are hereby notified that= any use, >>>>>>>>>> disclosure or copying of this email, including any attachments, = is >>>>>>>>>> unauthorized and strictly prohibited. If you have received this = email in >>>>>>>>>> error, please notify the sender of this email. Please delete thi= s and all >>>>>>>>>> copies of this email from your system. Any opinions either expre= ssed or >>>>>>>>>> implied in this email and all attachments, are those of its auth= or only, >>>>>>>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Director of Data Science >>>>>>>>> Cloudera >>>>>>>>> Twitter: @josh_wills >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Director of Data Science >>>>>>>> Cloudera >>>>>>>> Twitter: @josh_wills >>>>>>>> >>>>>>> >>>>>>> >>>>>>> *DISCLAIMER:* The contents of this email, including any >>>>>>> attachments, may contain information that is confidential, propriet= ary in >>>>>>> nature, protected health information (PHI), or otherwise protected = by law >>>>>>> from disclosure, and is solely for the use of the intended recipien= t(s). If >>>>>>> you are not the intended recipient, you are hereby notified that an= y use, >>>>>>> disclosure or copying of this email, including any attachments, is >>>>>>> unauthorized and strictly prohibited. If you have received this ema= il in >>>>>>> error, please notify the sender of this email. Please delete this a= nd all >>>>>>> copies of this email from your system. Any opinions either expresse= d or >>>>>>> implied in this email and all attachments, are those of its author = only, >>>>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Director of Data Science >>>>>> Cloudera >>>>>> Twitter: @josh_wills >>>>>> >>>>> >>>>> >>>> >>> >>> *DISCLAIMER:* The contents of this email, including any attachments, >>> may contain information that is confidential, proprietary in nature, >>> protected health information (PHI), or otherwise protected by law from >>> disclosure, and is solely for the use of the intended recipient(s). If = you >>> are not the intended recipient, you are hereby notified that any use, >>> disclosure or copying of this email, including any attachments, is >>> unauthorized and strictly prohibited. If you have received this email i= n >>> error, please notify the sender of this email. Please delete this and a= ll >>> copies of this email from your system. Any opinions either expressed or >>> implied in this email and all attachments, are those of its author only= , >>> and do not necessarily reflect those of Nuna Health, Inc. >>> >> >> >> >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills >> > > > *DISCLAIMER:* The contents of this email, including any attachments, may > contain information that is confidential, proprietary in nature, protecte= d > health information (PHI), or otherwise protected by law from disclosure, > and is solely for the use of the intended recipient(s). If you are not th= e > intended recipient, you are hereby notified that any use, disclosure or > copying of this email, including any attachments, is unauthorized and > strictly prohibited. If you have received this email in error, please > notify the sender of this email. Please delete this and all copies of thi= s > email from your system. Any opinions either expressed or implied in this > email and all attachments, are those of its author only, and do not > necessarily reflect those of Nuna Health, Inc. > --=20 Director of Data Science Cloudera Twitter: @josh_wills --001a113a7caabafbea0520e8e872 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Yeah, that makes sense to me-- not totally trivial to do, = but it should be possible.

J

On Tue, Sep 29, 2015 at 4:42 PM, = Everett Anderson <everett@nuna.com> wrote:
Hey,

We have some leads= . Increasing the datanode memory seems to help the immediate issue.

However, we need a solution to our buildup of temporary o= utputs. We're exploring segmenting our pipeline with run()/cleanup() ca= lls.

I'm curious, though --=C2=A0
Do you think it'd be possible for us to make a Crunch modi= fication to optionally actively cleanup temporary outputs? It seems like th= e planner would know what those are.

A temporary o= utput would be any PCollection that isn't referenced by outside of Crun= ch (or perhaps ones that aren't explicitly marked as cached).


On Thu, Sep 24, 2015 at 5:46 PM, Jo= sh Wills <jwills@cloudera.com> wrote:
Hrm. If you never call Pipeline.done, you sh= ould never cleanup the temporary files for the job...

On Thu, Sep 24, 2015 at= 5:44 PM, Everett Anderson <everett@nuna.com> wrote:
While we tried to take comfort i= n the fact that we'd only seen this only HD-based cc2.8xlarges, I'm= afraid we're now seeing it when processing larger amounts of data on S= SD-based c3.4x8larges.

My two hypotheses are
<= br>
1) Somehow these temp files are getting cleaned up before the= y're accessed for the last time. Perhaps either something in HDFS or Ha= doop cleans up these temp directories, or perhaps there's a bunch in Cr= unch's planner.

2) HDFS has chosen 3 machines = to replicate data to, but it is performing a very lopsided replication. Whi= le the cluster overall looks like it has HDFS capacity, perhaps a small sub= set of the machines is actually at capacity. Things seem to fail in obscure= ways when running out of disk.


2015-09-24 23:28:58,850 WARN [main] org.apache.hadoop.mapred.YarnChild: Ex=
ception running child : org.apache.crunch.CrunchRuntimeException: Could not=
 read runtime node information
	at org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTask=
Context.java:48)
	at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40=
)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:172)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: java.io.FileNotFoundException: File does not exist: /tmp/crunch-=
2031291770/p567/REDUCE
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java=
:65)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java=
:55)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUp=
dateTimes(FSNamesystem.java:1726)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsIn=
t(FSNamesystem.java:1669)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(F=
SNamesystem.java:1649)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(F=
SNamesystem.java:1621)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocati=
ons(NameNodeRpcServer.java:497)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTrans=
latorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java=
:322)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clie=
ntNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(=
ProtobufRpcEngine.java:599)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor=
AccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon=
structorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExcept=
ion.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteExcep=
tion.java:73)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1=
147)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1135)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1125)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlock=
Length(DFSInputStream.java:273)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:240)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:=
233)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1298)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSy=
stem.java:300)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSy=
stem.java:296)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResol=
ver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem=
.java:296)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768)
	at org.apache.crunch.util.DistCache.read(DistCache.java:72)
	at org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTask=
Context.java:46)
	... 9 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundExcept=
ion): File does not exist: /tmp/crunch-2031291770/p567/REDUCE
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java=
:65)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java=
:55)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUp=
dateTimes(FSNamesystem.java:1726)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsIn=
t(FSNamesystem.java:1669)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(F=
SNamesystem.java:1649)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(F=
SNamesystem.java:1621)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocati=
ons(NameNodeRpcServer.java:497)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTrans=
latorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java=
:322)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clie=
ntNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(=
ProtobufRpcEngine.java:599)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngin=
e.java:215)
	at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja=
va:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso=
rImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv=
ocationHandler.java:190)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio=
nHandler.java:103)
	at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.ge=
tBlockLocations(ClientNamenodeProtocolTranslatorPB.java:219)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1=
145)
	... 22 more
On Fri, Aug 21, 2015 at 3:52 PM, Jeff Quinn <jeff= @nuna.com> wrote:
Also worth noting, we inspected the hadoop configuration defaults = that the AWS EMR service populates for the two different instance types, fo= r mapred-site.xml, core-site.xml, and=C2=A0hdfs-site.xml all=C2=A0settings = were identical, with the exception of slight differences in JVM memory allo= tted. Further investigated the max number of file descriptors for each inst= ance type via ulimit, and saw no differences there either.=C2=A0

So not sure what the main difference is between these two clusters= that would cause these very different outcomes, other than cc2.8xlarge hav= ing SSDs and c3.8xlarge having spinning disks.

On Fri, Aug 21, 2015 at = 1:03 PM, Everett Anderson <everett@nuna.com> wrote:
Hey,

Jeff grac= iously agreed to try it out.

I'm afraid we'= ;re still getting failures on that instance type, though with 0.11 with the= patches, the cluster ended up in a state that no new applications could be= submitted afterwards.

The errors when running the= pipeline seem to be similarly HDFS related. It's quite odd.
=
Examples when using 0.11 + the patches:

=

2015-08-20 23:17:50,455 WARN [Thread-38] org.apache.had= oop.hdfs.DFSClient: Could not get block locations. Source file "/tmp/c= runch-274499863/p504/output/_temporary/1/_temporary/attempt_1440102643297_o= ut0_0107_r_000001_0/out0-r-00001" - Aborting...


2015-08-20 22:39:42,184 WARN [Thr= ead-51] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
= org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenod= e.LeaseExpiredException): No lease on /tmp/crunch-274499863/p510/output/_te= mporary/1/_temporary/attempt_1440102643297_out12_0103_r_000167_2/out12-r-00= 167 (inode 83784): File does not exist. [Lease.=C2=A0 Holder: DFSClient_att= empt_1440102643297_0103_r_000167_2_964529009_1, pendingcreates: 24]
at org.apache.hadoop.hdfs.s= erver.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
at org.apache.hadoop.hdfs.server= .namenode.FSNamesystem.abandonBlock(FSNamesystem.java:3486)
at org.apache.hadoop.hdfs.server.na= menode.NameNodeRpcServer.abandonBlock(NameNodeRpcServer.java:687)
at org.apache.hadoop.hdfs.pro= tocolPB.ClientNamenodeProtocolServerSideTranslatorPB.abandonBlock(ClientNam= enodeProtocolServerSideTranslatorPB.java:467)
at org.apache.hadoop.hdfs.protocol.proto.ClientNa= menodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientName= nodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.c= all(ProtobufRpcEngine.java:635)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Ser= ver$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2= 035)
at java.s= ecurity.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subje= ct.java:415)
= at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio= n.java:1628)
at org.= apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

=
at org.apache.hadoop.ipc.= Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
= at org.apache.hadoop.ipc.Proto= bufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:241)
at com.sun.proxy.$Proxy13.abandonBlock(= Unknown Source)
at o= rg.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.abandon= Block(ClientNamenodeProtocolTranslatorPB.java:376)
at sun.reflect.GeneratedMethodAccessor9.invo= ke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc= essorImpl.java:43)
a= t java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryIn= vocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.Ret= ryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.abandonBl= ock(Unknown Source)
= at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStrea= m(DFSOutputStream.java:1377)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutp= utStream.java:594)
2015-08-20 22:39:42,184 WARN [Thread-51] org.a= pache.hadoop.hdfs.DFSClient: Could not get block locations. Source file &qu= ot;/tmp/crunch-274499863/p510/output/_temporary/1/_temporary/attempt_144010= 2643297_out12_0103_r_000167_2/out12-r-00167" - Aborting...
=



20= 15-08-20 23:34:59,276 INFO [Thread-37] org.apache.hadoop.hdfs.DFSClient: Ex= ception in createBlockOutputStream
java.io.IOException: Bad conne= ct ack with firstBadLink as 10.55.1.103:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBloc= kOutputStream(DFSOutputStream.java:1472)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer= .nextBlockOutputStream(DFSOutputStream.java:1373)
at org.apache.hadoop.hdfs.DFSOutputStream$Dat= aStreamer.run(DFSOutputStream.java:594)
2015-08-20 23:34:59,276 I= NFO [Thread-37] org.apache.hadoop.hdfs.DFSClient: Abandoning BP-835517662-1= 0.55.1.32-1440102626965:blk_1073828261_95268
2015-08-20 23:34:59,= 278 INFO [Thread-37] org.apache.hadoop.hdfs.DFSClient: Excluding datanode <= a href=3D"http://10.55.1.103:50010" target=3D"_blank">10.55.1.103:50010=
2015-08-20 23:34:59,278 WARN [Thread-37] org.apache.hadoop.hdfs.= DFSClient: DataStreamer Exception
java.io.IOException: Unable to = create new block.
at= org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1386)
= at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutput= Stream.java:594)
2015-08-20 23:34:59,278 WARN [Thread-37] org.apa= che.hadoop.hdfs.DFSClient: Could not get block locations. Source file "= ;/tmp/crunch-274499863/p504/output/_temporary/1/_temporary/attempt_14401026= 43297_out0_0107_r_000001_2/out0-r-00001" - Aborting...
2015-= 08-20 23:34:59,279 WARN [main] org.apache.hadoop.mapred.YarnChild: Exceptio= n running child : org.apache.crunch.CrunchRuntimeException: java.io.IOExcep= tion: Bad connect ack with firstBadLink as 10.55.1.103:50010
at org.apache.crunch.impl.mr.run.CrunchTaskC= ontext.cleanup(CrunchTaskContext.java:74)
at org.apache.crunch.impl.mr.run.CrunchReducer.cleanu= p(CrunchReducer.java:64)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195)
at org.apache.hadoop.mapred.R= educeTask.runNewReducer(ReduceTask.java:656)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceT= ask.java:394)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
at java.security.AccessC= ontroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.ha= doop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ma= pred.YarnChild.main(YarnChild.java:166)
Caused by: java.io.IOExce= ption: Bad connect ack with firstBadLink as 10.55.1.103:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStr= eamer.createBlockOutputStream(DFSOutputStream.java:1472)
at org.apache.hadoop.hdfs.DFSOutputStr= eam$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373)
at org.apache.hadoop.hdfs.DFS= OutputStream$DataStreamer.run(DFSOutputStream.java:594)







On Fri, Aug 21, 2015 at 11:59 AM, Josh = Wills <jwills@cloudera.com> wrote:
Curious how this went. :)

On Tue, Aug 18, 2015= at 4:26 PM, Everett Anderson <everett@nuna.com> wrote:
Sure, let me give it a try. I= 'm going to take 0.11 and patch it with


as we= also rely on 517.



On Tue, Aug 18, 2015 a= t 4:09 PM, Josh Wills <jwills@cloudera.com> wrote:
(In particular, I'm wonderi= ng if something in CRUNCH-481 is related to this problem.)
<= div class=3D"gmail_extra">
On Tue, Aug 18, 20= 15 at 4:07 PM, Josh Wills <jwills@cloudera.com> wrote:
=
Hey Everett,

=
Shot in the dark-- would you mind trying it w/0.11.0-hadoop2 w/the 553= patch? Is that easy to do?

J
On Tue, Aug 18, 2015 at 3:18 PM, Everett Anders= on <everett@nuna.com> wrote:
Hi,

I verified that the pipeline succ= eeds on the same cc2.8xlarge hardware when setting crunch.max.running.jobs to 1. I ge= nerally feel like the pipeline application itself logic is sound, at this p= oint. It could be that this is just taxing these machines too hard and we n= eed to increase the number of retries?

It reliably= fails on this hardware when crunch.max.running.jobs set to its default.
Can you explain a little what the /tmp/crunch-XXXXXXX files ar= e as well as how Crunch uses side effect files? Do you know if HDFS would c= lean up those directories from underneath Crunch?

= There are usually 4 failed applications, failing due to reduces. The failur= es seem to be one of the following three kinds -- (1) No lease on <side = effect file>, (2) File not found </tmp/crunch-XXXXXXX> file, (3) S= ocketTimeoutException.

Examples:

[1] No lease exception

Error: org.apache.crunch.CrunchRuntimeException: org.= apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.Le= aseExpiredException): No lease on /tmp/crunch-4694113/p662/output/_temporar= y/1/_temporary/attempt_1439917295505_out7_0018_r_000003_1/out7-r-00003: Fil= e does not exist. Holder DFSClient_attempt_1439917295505_0018_r_000003_1_82= 4053899_1 does not have any open files. at org.apache.hadoop.hdfs.server.na= menode.FSNamesystem.checkLease(FSNamesystem.java:2944) at org.apache.hadoop= .hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3= 008) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FS= Namesystem.java:2988) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpc= Server.complete(NameNodeRpcServer.java:641) at org.apache.hadoop.hdfs.proto= colPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeP= rotocolServerSideTranslatorPB.java:484) at org.apache.hadoop.hdfs.protocol.= proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMet= hod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpc= Engine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:599) at org.ap= ache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Serv= er$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$= 1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Nati= ve Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apa= che.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:154= 8) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apa= che.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.java:74)= at org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReducer.java:= 64) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) at org.apa= che.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) at org.apac= he.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) at org.apache.hadoop.m= apred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController= .doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.ja= va:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.j= ava:170) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop= .hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/crunch-46941= 13/p662/output/_temporary/1/_temporary/attempt_1439917295505_out7_0018_r_00= 0003_1/out7-r-00003: File does not exist. Holder DFSClient_attempt_14399172= 95505_0018_r_000003_1_824053899_1 does not have any open files. at org.apac= he.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:29= 44) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInte= rnal(FSNamesystem.java:3008) at org.apache.hadoop.hdfs.server.namenode.FSNa= mesystem.completeFile(FSNamesystem.java:2988) at org.apache.hadoop.hdfs.ser= ver.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:641) at org.= apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.= complete(ClientNamenodeProtocolServerSideTranslatorPB.java:484) at org.apac= he.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodePr= otocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apach= e.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEn= gine.java:599) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at or= g.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.ha= doop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessCont= roller.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subj= ect.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserG= roupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Serv= er.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org= .apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.P= rotobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:215) at com.sun.prox= y.$Proxy13.complete(Unknown Source) at sun.reflect.NativeMethodAccessorImpl= .invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Nati= veMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.i= nvoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.inv= oke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.i= nvokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.= RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.p= roxy.$Proxy13.complete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB= .ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslat= orPB.java:404) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOu= tputStream.java:2130) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOu= tputStream.java:2114) at org.apache.hadoop.fs.FSDataOutputStream$PositionCa= che.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputS= tream.close(FSDataOutputStream.java:105) at org.apache.hadoop.io.SequenceFi= le$Writer.close(SequenceFile.java:1289) at org.apache.hadoop.mapreduce.lib.= output.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:87) a= t org.apache.crunch.io.CrunchOutputs$OutputState.close(CrunchOutputs.java:3= 00) at org.apache.crunch.io.CrunchOutputs.close(CrunchOutputs.java:180) at = org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.j= ava:72) ... 9 more


[2] File does not exist<= /span>

=
2015-08-18 17:36:10,195 INFO [AsyncDispatcher event handle=
r] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics=
 report from attempt_1439917295505_0034_r_000004_1: Error: org.apache.crunc=
h.CrunchRuntimeException: Could not read runtime node information
	at org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTask=
Context.java:48)
	at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40=
)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:172)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: java.io.FileNotFoundException: File does not exist: /tmp/crunch-=
4694113/p470/REDUCE
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java=
:65)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java=
:55)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUp=
dateTimes(FSNamesystem.java:1726)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsIn=
t(FSNamesystem.java:1669)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(F=
SNamesystem.java:1649)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(F=
SNamesystem.java:1621)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocati=
ons(NameNodeRpcServer.java:497)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTrans=
latorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java=
:322)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clie=
ntNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(=
ProtobufRpcEngine.java:599)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor=
AccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon=
structorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExcept=
ion.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteExcep=
tion.java:73)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1=
147)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1135)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1125)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlock=
Length(DFSInputStream.java:273)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:240)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:=
233)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1298)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSy=
stem.java:300)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSy=
stem.java:296)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResol=
ver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem=
.java:296)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768)
	at org.apache.crunch.util.DistCache.read(DistCache.java:72)
	at org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTask=
Context.java:46)
	... 9 more
[3] SocketTimeoutException
Er=
ror: org.apache.crunch.CrunchRuntimeException: java.net.SocketTimeoutExcept=
ion: 70000 millis timeout while waiting for channel to be ready for read. c=
h : java.nio.channels.SocketChannel[connected local=3D/10.55.1.229:35720 remote=3D/10.55.1.230:9200] at org=
.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.java=
:74) at org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReducer.j=
ava:64) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) at org=
.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) at org.=
apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) at org.apache.hado=
op.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessContro=
ller.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subjec=
t.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro=
upInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChi=
ld.java:170) Caused by: java.net.SocketTimeoutException: 70000 millis timeo=
ut while waiting for channel to be ready for read. ch : java.nio.channels.S=
ocketChannel[connected local=3D/10.55.1.229:35720 remote=3D/10.55.1.230:9200] at org.apache.hadoop.net.Socke=
tIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.=
SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net=
.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.ne=
t.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInput=
Stream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(Fi=
lterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vint=
Prefixed(PBHelper.java:1985) at org.apache.hadoop.hdfs.DFSOutputStream$Data=
Streamer.transfer(DFSOutputStream.java:1075) at org.apache.hadoop.hdfs.DFSO=
utputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:=
1042) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineF=
orAppendOrRecovery(DFSOutputStream.java:1186) at org.apache.hadoop.hdfs.DFS=
OutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:935) at=
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja=
va:491)




<= /div>





<= /div>



On Fri, Aug 14, 2015 at 3:54 PM, Everett = Anderson <everett@nuna.com> wrote:


On Fri, Aug 14, 2015 at 3:26 PM, Josh Wills <jwill= s@cloudera.com> wrote:
Hey Everett,

Initial thought-- there are lo= ts of reasons for lease expired exceptions, and their usually more symptoma= tic of other problems in the pipeline. Are you sure none of the jobs in the= Crunch pipeline on the non-SSD instances are failing for some other reason= ? I'd be surprised if no other errors showed up in the app master, alth= ough there are reports of some weirdness around LeaseExpireds when writing = to S3-- but you're not doing that here, right?
=

We're reading from and writing to HDFS, here= . (We've copied in input from S3 to HDFS in another step.)
There are a few exceptions in the logs. Most seem related to m= issing temp files.

Let me see if I can reproduce i= t with=C2=A0cr= unch.max.running.jobs set to 1 to try to narrow down the originating fa= ilure.


=C2=A0

J

On Fri, Aug 14, 2= 015 at 2:10 PM, Everett Anderson <everett@nuna.com> wrote:
Hi,<= div>
I recently started trying to run our Crunch pipeline on = more data and have been trying out different AWS instance types in anticipa= tion of our storage and compute needs.

I was using= EMR 3.8 (so Hadoop 2.4.0) with Crunch 0.12 (patched with the CRUNCH-55= 3 fix).

Our pipeline finishes fine in these cl= uster configurations:
  • 50 c3.4xlarge Core, 0 Task
  • = 10 c3.8xlarge Core, 0 Task
  • 25 c3.8xlarge Core, 0 Task
However, it always fails on the same data when using 10 cc2.8xlarge = Core instances.

The biggest obvious hardware diffe= rence is that the cc2.8xlarges use hard disks instead of SSDs.
While it's a little hard to track down the exact originati= ng failure, I think it's from errors like:

201= 5-08-13 21:34:38,379 ERROR [IPC Server handler 24 on 45711] org.apache.hado= op.mapred.TaskAttemptListenerImpl: Task: attempt_1439499407003_0028_r_00015= 3_1 - exited : org.apache.crunch.CrunchRuntimeException: org.apache.hadoop.= ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredExce= ption): No lease on /tmp/crunch-970849245/p662/output/_temporary/1/_tempora= ry/attempt_1439499407003_out7_0028_r_000153_1/out7-r-00153: File does not e= xist. Holder DFSClient_attempt_1439499407003_0028_r_000153_1_609888542_1 do= es not have any open files.

Those paths look l= ike these side effect files.

<= div>Would Crunch have generated applications that depend on side effect pat= hs as input across MapReduce applications and something in HDFS is cleaning= up those paths, unaware of the higher level dependencies? AWS configures H= adoop differently for each instance type, and might have more aggressive cl= eanup settings on HDs, though this is very uninformed hypothesis.

A sample full log is attached.

Tha= nks for any guidance!

- Everett


DISCLAIMER:=C2= =A0The contents of this email, including any attachments, may contain infor= mation that is confidential, proprietary in nature, protected health inform= ation (PHI), or otherwise protected by law from disclosure, and is solely f= or the use of the intended recipient(s). If you are not the intended recipi= ent, you are hereby notified that any use, disclosure or copying of this em= ail, including any attachments, is unauthorized and strictly prohibited. If= you have received this email in error, please notify the sender of this em= ail. Please delete this and all copies of this email from your system. Any = opinions either expressed or implied in this email and all attachments, are= those of its author only, and do not necessarily reflect those of Nuna Hea= lth, Inc.


--
Director of Data Science=



DISCLAIMER:=C2=A0The conten= ts of this email, including any attachments, may contain information that i= s confidential, proprietary in nature, protected health information (PHI), = or otherwise protected by law from disclosure, and is solely for the use of= the intended recipient(s). If you are not the intended recipient, you are = hereby notified that any use, disclosure or copying of this email, includin= g any attachments, is unauthorized and strictly prohibited. If you have rec= eived this email in error, please notify the sender of this email. Please d= elete this and all copies of this email from your system. Any opinions eith= er expressed or implied in this email and all attachments, are those of its= author only, and do not necessarily reflect those of Nuna Health, Inc.



--
Director of Data Science
Twitter: @josh_wills
<= /div>



--
=
Director of Data Science
Twitter: @josh_wills


DISCLAIMER:=C2=A0The conten= ts of this email, including any attachments, may contain information that i= s confidential, proprietary in nature, protected health information (PHI), = or otherwise protected by law from disclosure, and is solely for the use of= the intended recipient(s). If you are not the intended recipient, you are = hereby notified that any use, disclosure or copying of this email, includin= g any attachments, is unauthorized and strictly prohibited. If you have rec= eived this email in error, please notify the sender of this email. Please d= elete this and all copies of this email from your system. Any opinions eith= er expressed or implied in this email and all attachments, are those of its= author only, and do not necessarily reflect those of Nuna Health, Inc.


--
Director of Data Science
Twitter: @josh_wills
<= /div>




DISCLAIMER:=C2=A0The conten= ts of this email, including any attachments, may contain information that i= s confidential, proprietary in nature, protected health information (PHI), = or otherwise protected by law from disclosure, and is solely for the use of= the intended recipient(s). If you are not the intended recipient, you are = hereby notified that any use, disclosure or copying of this email, includin= g any attachments, is unauthorized and strictly prohibited. If you have rec= eived this email in error, please notify the sender of this email. Please d= elete this and all copies of this email from your system. Any opinions eith= er expressed or implied in this email and all attachments, are those of its= author only, and do not necessarily reflect those of Nuna Health, Inc.


--
Director of Data Science
Twitter: @josh_wills
<= /div>


DISCLAIMER:=C2=A0The conten= ts of this email, including any attachments, may contain information that i= s confidential, proprietary in nature, protected health information (PHI), = or otherwise protected by law from disclosure, and is solely for the use of= the intended recipient(s). If you are not the intended recipient, you are = hereby notified that any use, disclosure or copying of this email, includin= g any attachments, is unauthorized and strictly prohibited. If you have rec= eived this email in error, please notify the sender of this email. Please d= elete this and all copies of this email from your system. Any opinions eith= er expressed or implied in this email and all attachments, are those of its= author only, and do not necessarily reflect those of Nuna Health, Inc.


--
Director of Data Science
--001a113a7caabafbea0520e8e872--