Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3ADE6200B16 for ; Mon, 20 Jun 2016 22:21:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 397FE160A55; Mon, 20 Jun 2016 20:21:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5C0BD160A26 for ; Mon, 20 Jun 2016 22:21:13 +0200 (CEST) Received: (qmail 24916 invoked by uid 500); 20 Jun 2016 20:21:12 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 24904 invoked by uid 99); 20 Jun 2016 20:21:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jun 2016 20:21:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4267AC0BC5 for ; Mon, 20 Jun 2016 20:21:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.702 X-Spam-Level: X-Spam-Status: No, score=-0.702 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id yHHqbuqmUojm for ; Mon, 20 Jun 2016 20:21:10 +0000 (UTC) Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D32445FB60 for ; Mon, 20 Jun 2016 20:21:09 +0000 (UTC) Received: by mail-io0-f175.google.com with SMTP id 5so141579295ioy.1 for ; Mon, 20 Jun 2016 13:21:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=OP0CYiXv4qJLIqlZzpSGtwicR/cf+EKD7hp7u/1Uf6s=; b=j11v1QK4yAaTKNLLFnRUNGNwyQFJ7weJkQCcsXhCEZ8s+yGIFn6gL0Ew0LNpkvP5pG sNk1VfSOjh733E20Br3ur21tg+uS0dWyV2utwDbnCTNZY3hcsSoimUiOo633nro5YrOB L5PRENXQ+HJ+WKYVyO9wrQ9bqpM89QD6om4TevR17krsQ6hBWJu4d7V7KjtPd2w2BhTh lJbok4iEXU/IzmDhUObJTHYD85o+5iXlutWfkIOR3jNrAi88fC07bt1+C6ZzP6QoGhx6 qtTUbyOE4pc6hgDiYowBJDLwEQfwyhzqXWOAfDpul1Dn7hfdN/RDAk8Uc7dSg3lqA6ve P/rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=OP0CYiXv4qJLIqlZzpSGtwicR/cf+EKD7hp7u/1Uf6s=; b=BVZYaQpFqYUBeDOyXXc/1T0uL9RxeSRvFzOxSGHA3udVS3AZbrmMFt7ND0BS46VgtL DDNSLCGUxtwDugxzJpfdQLnVhDHslO5i52Kghho7OlswbUxtEcC7Is2nZSILfBEOJi0K oYeXY3RJpGcljCSEOKAGXSARgBl1XWUU4RN4V5Hg1g1lZdUOk5sr8yTo+Yj7/Ok7E7J8 nv3Gi8sJ8mzd05Z1mDAv6FzbAWOEfHQ0iHf7u42/q0gY1L2hfsWLZcCYcone/WIbnrSb KNPczrZdJZKJ1r3C070nJ4nHfFAJcD7pqxpxq55e8A9vjlOWwWzXLtPZ8ij5eFGtDbaL /OFw== X-Gm-Message-State: ALyK8tJ2pStTLbkwAtEfIqcMgIH4Ooj1VmWUyRmhNqTxeW63iX/YiyQyon1do/LCgII7x+YqIznj/3ct/O+4yyEW MIME-Version: 1.0 X-Received: by 10.107.9.169 with SMTP id 41mr25703808ioj.196.1466454060533; Mon, 20 Jun 2016 13:21:00 -0700 (PDT) Received: by 10.36.217.16 with HTTP; Mon, 20 Jun 2016 13:21:00 -0700 (PDT) In-Reply-To: References: Date: Mon, 20 Jun 2016 13:21:00 -0700 Message-ID: Subject: Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath From: Marcelo Vanzin To: Jonathan Kelly Cc: user , Spark dev list Content-Type: text/plain; charset=UTF-8 archived-at: Mon, 20 Jun 2016 20:21:14 -0000 It doesn't hurt to have a bug tracking it, in case anyone else has time to look at it before I do. On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly wrote: > Thanks for the confirmation! Shall I cut a JIRA issue? > > On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin wrote: >> >> I just tried this locally and can see the wrong behavior you mention. >> I'm running a somewhat old build of 2.0, but I'll take a look. >> >> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly >> wrote: >> > Does anybody have any thoughts on this? >> > >> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly >> > wrote: >> >> >> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit >> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's >> >> log4j.properties is >> >> not getting picked up in the executor classpath (and driver classpath >> >> for >> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking >> >> precedence >> >> in the YARN containers. >> >> >> >> Spark's log4j.properties file is correctly being bundled into the >> >> __spark_conf__.zip file and getting added to the DistributedCache, but >> >> it is >> >> not in the classpath of the executor, as evidenced by the following >> >> command, >> >> which I ran in spark-shell: >> >> >> >> scala> sc.parallelize(Seq(1)).map(_ => >> >> getClass().getResource("/log4j.properties")).first >> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties >> >> >> >> I then ran the following in spark-shell to verify the classpath of the >> >> executors: >> >> >> >> scala> sc.parallelize(Seq(1)).map(_ => >> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e >> >> => >> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println) >> >> ... >> >> >> >> >> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003 >> >> >> >> >> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__ >> >> /etc/hadoop/conf >> >> ... >> >> >> >> So the JVM has this nonexistent __spark_conf__ directory in the >> >> classpath >> >> when it should really be __spark_conf__.zip (which is actually a >> >> symlink to >> >> a directory, despite the .zip filename). >> >> >> >> % sudo ls -l >> >> >> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003 >> >> total 20 >> >> -rw-r--r-- 1 yarn yarn 88 Jun 18 01:26 container_tokens >> >> -rwx------ 1 yarn yarn 594 Jun 18 01:26 >> >> default_container_executor_session.sh >> >> -rwx------ 1 yarn yarn 648 Jun 18 01:26 default_container_executor.sh >> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh >> >> lrwxrwxrwx 1 yarn yarn 59 Jun 18 01:26 __spark_conf__.zip -> >> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip >> >> lrwxrwxrwx 1 yarn yarn 77 Jun 18 01:26 __spark_libs__ -> >> >> >> >> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip >> >> drwx--x--- 2 yarn yarn 46 Jun 18 01:26 tmp >> >> >> >> Does anybody know why this is happening? Is this a bug in Spark, or is >> >> it >> >> the JVM doing this (possibly because the extension is .zip)? >> >> >> >> Thanks, >> >> Jonathan >> >> >> >> -- >> Marcelo -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org