Return-Path: X-Original-To: apmail-spark-reviews-archive@minotaur.apache.org Delivered-To: apmail-spark-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43D8C1762A for ; Tue, 28 Oct 2014 19:50:29 +0000 (UTC) Received: (qmail 77448 invoked by uid 500); 28 Oct 2014 19:50:29 -0000 Delivered-To: apmail-spark-reviews-archive@spark.apache.org Received: (qmail 77425 invoked by uid 500); 28 Oct 2014 19:50:29 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 77414 invoked by uid 99); 28 Oct 2014 19:50:28 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2014 19:50:28 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 88627996F3D; Tue, 28 Oct 2014 19:50:28 +0000 (UTC) From: vanzin To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org Message-ID: Subject: [GitHub] spark pull request: [SPARK-4048] Enhance and extend hadoop-provide... Content-Type: text/plain Date: Tue, 28 Oct 2014 19:50:28 +0000 (UTC) GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/2982 [SPARK-4048] Enhance and extend hadoop-provided profile. This change does a few things to make the hadoop-provided more useful: - Create new profiles for other libraries / services that might be provided by the infrastructure - Simplify and fix the poms so that the profiles are only activated while building assemblies. - Fix tests so that they're able to run when the profiles are activated - Add a new env variable to be used by distributions that use these profiles to provide the runtime classpath for Spark jobs and daemons. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-4048 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2982.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2982 ---- commit 343ab596e2aa77b4d46f6bea65fed024a6b46168 Author: Marcelo Vanzin Date: 2014-10-20T18:30:47Z Rework the "hadoop-provided" profile, add new ones. The "hadoop-provided" profile should only apply during packaging, since, for example, "spark-core" should still have a compile-time dependency on hadoop since it exposes hadoop types in its API. So reorganize the dependencies a bit so that the scopes are overridden in the packaging targets. Also, a lot of the dependencies packaged in the examples/ assembly are already provided by the main assembly, so clean those up. Also, add similar profiles for hive, parquet, flume and hbase (the last two just used by the examples/ code, although the flume one could also potentially be used by user's poms when packaging the flume backend). This change also includes a fix to parameterize the hbase artifact, since the structure of the dependencies have changed along the 0.9x line. It also cleans some unneeded dependencies in a few poms. commit 39d5a55ac46315da5c2fb4b1327aac18da89d812 Author: Marcelo Vanzin Date: 2014-10-21T16:59:44Z Re-enable maven-install-plugin for a few projects. Without this, running specific targets directly (e.g. mvn -f assembly/pom.xml) doesn't work. commit 0beb2d3cf05ba62300373948a4aaa4b1de816f61 Author: Marcelo Vanzin Date: 2014-10-23T20:19:41Z Propagate classpath to child processes during testing. When spawning child processes that use the Spark assembly jar in unit tests, all classes needed to run Spark are needed. If the assembly is built using the "*-provided" profiles, some classes will not be part of the assembly, although they'll be part of the unit test's class path since maven/sbt will make the dependencies available. So this change extends the unit test's class path to the child processes so that all classes are available. I also parameterized the "spark.test.home" setting so that you can do things like "mvn -f core/pom.xml test" and have it work (as long as you set it to a proper value; unfortunately maven makes this super painful to do automatically, because of things like MNG-5522). commit 894f354c3624045d1567d8b30cf547dce78f833f Author: Marcelo Vanzin Date: 2014-10-23T22:04:11Z Introduce "SPARK_DIST_CLASSPATH". This env variable is processed by compute-classpath.sh and appended to the generated classpath; it allows distributions that ship with reduced assemblies (e.g. those built with the "hadoop-provided" profile) to set it to add any needed libraries to the classpath when running Spark. commit d6b8aadf4cd1a321229ce2115e1d2ce3fd2dcbb4 Author: Marcelo Vanzin Date: 2014-10-27T20:55:45Z Propagate SPARK_DIST_CLASSPATH on Yarn. Yarn builds the classpath based on the Hadoop configuration, which may miss thing in case non-Hadoop classes are needed (for example, when Spark is built with "-Phive-provided" and the user is running code that uses HiveContext). So propagate the distribution's classpath variable so that the extra classpath is automatically added to all containers. commit d2613469fefcfa24d591b2160b9c0dac8733aed5 Author: Marcelo Vanzin Date: 2014-10-28T18:57:44Z Redirect child stderr to parent's log. Instead of writing to System.err directly. That way the console is not polluted when running child processes. Also remove an unused env variable that caused a warning when running Spark jobs in child processes. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org