Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 6 Jul 2016 15:25:11 +0000 (UTC)
From: "Allen Wittenauer (JIRA)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.12985926.1467325706000.37947.1467818711667@Atlassian.JIRA>
In-Reply-To: <JIRA.12985926.1467325706000@Atlassian.JIRA>
References: <JIRA.12985926.1467325706000@Atlassian.JIRA> <JIRA.12985926.1467325706166@arcas>
Subject: [jira] [Commented] (HADOOP-13335) Add an option to suppress the
 'use yarn jar' warning or remove it
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 06 Jul 2016 15:25:14 -0000


    [ https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364457#comment-15364457 ] 

Allen Wittenauer commented on HADOOP-13335:
-------------------------------------------

bq. Not yet - until someone feels the need to have hdfs used as a separate command to do something other than filesystem operations. (Really hope there's no hdfs jar command)

Not on my watch. :) It's pretty clear from this JIRA just how much damage the extra YARN stuff has caused.  (and this is just the tip of the iceberg...)  Plus, between HADOOP-11485, HADOOP-12930, and a handful of other features, there are much better ways for 1st, 2nd, and 3rd parties to integrate extra stuff. \[1]

bq.  I don't see the difference between hadoop jar and yarn jar other than a different set of variables being set and respected by the different commands. 

That's correct. But those extra vars make a world of difference .  In branch-2, HADOOP\_OPTS and YARN\_OPTS don't cross.  Ever.  This effectively makes the hadoop, hdfs, and mapred entry points configured differently than yarn.

bq. Stepping back - should the YARN\_\* parameters exist?, and should yarn jar exist? If I understand you correctly, I think you're trying to get rid of some of this.

That's absolutely correct: none of this should exist and I've worked really hard on either removing or hiding the complexity going forward. But this only gets easier in trunk.  It's way too late and way too hard to fix this mess in branch-2.

bq. If 'yarn jar' is something that we think is confusing, or something we potentially want to get rid off - I'd say it's better to not print any warning at all - and leave hadoop jar as is?

It's going to be hard to take yarn jar or hadoop jar away. It's doubtful they will ever get removed. That said, we can at least make them act and work the same way.  To me, that's the ultimate goal and it's pretty close to what happens in trunk:

1. yarn command sucks in yarn-env.sh, hadoop-env.sh, yarn-config.sh and hadoop-config.sh in a way that should be mostly conflict-free. (non-yarn commands do not pull in yarn-x.sh, obviously)
2. If YARN\_OPTS is defined, yarn x (jar, rmadmin, etc) will use it but throw a deprecation warning.
3. Otherwise use HADOOP\_OPTS

As folks migrate to a release based on trunk, this extra fluff will go away at least configuration-wise.

Eventually, when we can remove support for all of these deprecated vars, this will reduce code complexity and gives us (effectively) one code path to test. But bear in mind that we're years away before YARN\_OPTS and friends disappear.  It's been 5+ years since our last trunk release. I'll likely be dead by the time 4.x comes out and these useless YARN\_\* vars can get culled officially.  But the work has to start now.

bq. The hive binary could unset YARN_OPTS / YARN_CLIENT_OPTS - and leave them intact for the session/shell from where the hive binary was invoked.

Which, again, if hive wants to stick to using hadoop jar, this would be my advice. \[2]  Just keep in mind this also means that any user settings that they might have wanted to apply to their YARN environment will not kick in. It's no different than what is happening today, but it may not reflect what users want.  Thus we're back to why the warnings went in.

\[1]  I'm not saying the community would do this, but let's use hive as an example here of how much more powerful things are in trunk.  With HADOOP-12930, it's now possible for hive to add a 'hadoop hive' command or a (even more outrageous!) 'hdfs hivefs' command.  Rather than integrate *outside* the framework, one could integrate *inside* and pull exactly the information required. This will hopefully put 'hadoop jar' and 'yarn jar' on life support. We'll keep them around but it's going to be much more desirable to just integrate directly.  The new 'mapred streaming' command is a great example here: why make users call hadoop jar with some weird version number when it's now trivial to dynamically add commands at build or configure time?

\[2] Very long term (post-3.x), it would probably be better if hive called hadoop-config.sh and/or hadoop-functions.sh directly.  This would bypass the middleman and give much better control.  I'd be very interested to hear what sort of holes we have in the functionality here that makes this hard/impossible. Off the top, I suspect we need to make one big function of the series of function calls in hadoop-config.sh, but would love to hear your insight on this.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13335
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13335
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive uses it to start all it's services (HiveServer2, the hive client, beeline etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of sense - there's no relation to yarn other than requiring the classpath to include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN variables are set (leave it in the help message), or adding a mechanism which would allow users to suppress this WARNING.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org