Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 732AB17A58 for ; Mon, 1 Jun 2015 21:03:18 +0000 (UTC) Received: (qmail 39719 invoked by uid 500); 1 Jun 2015 21:03:18 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 39695 invoked by uid 500); 1 Jun 2015 21:03:18 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 39685 invoked by uid 99); 1 Jun 2015 21:03:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 21:03:18 +0000 Date: Mon, 1 Jun 2015 21:03:18 +0000 (UTC) From: "Marcelo Vanzin (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567995#comment-14567995 ] Marcelo Vanzin commented on SPARK-4048: --------------------------------------- bq. because Spark Master is explicitly using code from curator jar. But that's not a valid argument. spark-core explicitly uses a bunch of Hadoop APIs, and still if you enable hadoop-provided, those APIs will not be in the Spark assembly. Again, that is *the whole purpose* of this profile. > Enhance and extend hadoop-provided profile > ------------------------------------------ > > Key: SPARK-4048 > URL: https://issues.apache.org/jira/browse/SPARK-4048 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 1.2.0 > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Fix For: 1.3.0 > > > The hadoop-provided profile is used to not package Hadoop dependencies inside the Spark assembly. It works, sort of, but it could use some enhancements. A quick list: > - It doesn't include all things that could be removed from the assembly > - It doesn't work well when you're publishing artifacts based on it (SPARK-3812 fixes this) > - There are other dependencies that could use similar treatment: Hive, HBase (for the examples), Flume, Parquet, maybe others I'm missing at the moment. > - Unit tests, more specifically, those that use local-cluster mode, do not work when the assembly is built with this profile enabled. > - The scripts to launch Spark jobs do not add needed "provided" jars to the classpath when this profile is enabled, leaving it for people to figure that out for themselves. > - The examples assembly duplicates a lot of things in the main assembly. > Part of this task is selfish since we build internally with this profile and we'd like to make it easier for us to merge changes without having to keep too many patches on top of upstream. But those feel like good improvements to me, regardless. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org