hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it
Date Wed, 06 Jul 2016 15:25:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364457#comment-15364457

Allen Wittenauer commented on HADOOP-13335:

bq. Not yet - until someone feels the need to have hdfs used as a separate command to do something
other than filesystem operations. (Really hope there's no hdfs jar command)

Not on my watch. :) It's pretty clear from this JIRA just how much damage the extra YARN stuff
has caused.  (and this is just the tip of the iceberg...)  Plus, between HADOOP-11485, HADOOP-12930,
and a handful of other features, there are much better ways for 1st, 2nd, and 3rd parties
to integrate extra stuff. \[1]

bq.  I don't see the difference between hadoop jar and yarn jar other than a different set
of variables being set and respected by the different commands. 

That's correct. But those extra vars make a world of difference .  In branch-2, HADOOP\_OPTS
and YARN\_OPTS don't cross.  Ever.  This effectively makes the hadoop, hdfs, and mapred entry
points configured differently than yarn.

bq. Stepping back - should the YARN\_\* parameters exist?, and should yarn jar exist? If I
understand you correctly, I think you're trying to get rid of some of this.

That's absolutely correct: none of this should exist and I've worked really hard on either
removing or hiding the complexity going forward. But this only gets easier in trunk.  It's
way too late and way too hard to fix this mess in branch-2.

bq. If 'yarn jar' is something that we think is confusing, or something we potentially want
to get rid off - I'd say it's better to not print any warning at all - and leave hadoop jar
as is?

It's going to be hard to take yarn jar or hadoop jar away. It's doubtful they will ever get
removed. That said, we can at least make them act and work the same way.  To me, that's the
ultimate goal and it's pretty close to what happens in trunk:

1. yarn command sucks in yarn-env.sh, hadoop-env.sh, yarn-config.sh and hadoop-config.sh in
a way that should be mostly conflict-free. (non-yarn commands do not pull in yarn-x.sh, obviously)
2. If YARN\_OPTS is defined, yarn x (jar, rmadmin, etc) will use it but throw a deprecation
3. Otherwise use HADOOP\_OPTS

As folks migrate to a release based on trunk, this extra fluff will go away at least configuration-wise.

Eventually, when we can remove support for all of these deprecated vars, this will reduce
code complexity and gives us (effectively) one code path to test. But bear in mind that we're
years away before YARN\_OPTS and friends disappear.  It's been 5+ years since our last trunk
release. I'll likely be dead by the time 4.x comes out and these useless YARN\_\* vars can
get culled officially.  But the work has to start now.

bq. The hive binary could unset YARN_OPTS / YARN_CLIENT_OPTS - and leave them intact for the
session/shell from where the hive binary was invoked.

Which, again, if hive wants to stick to using hadoop jar, this would be my advice. \[2]  Just
keep in mind this also means that any user settings that they might have wanted to apply to
their YARN environment will not kick in. It's no different than what is happening today, but
it may not reflect what users want.  Thus we're back to why the warnings went in.

\[1]  I'm not saying the community would do this, but let's use hive as an example here of
how much more powerful things are in trunk.  With HADOOP-12930, it's now possible for hive
to add a 'hadoop hive' command or a (even more outrageous!) 'hdfs hivefs' command.  Rather
than integrate *outside* the framework, one could integrate *inside* and pull exactly the
information required. This will hopefully put 'hadoop jar' and 'yarn jar' on life support.
We'll keep them around but it's going to be much more desirable to just integrate directly.
 The new 'mapred streaming' command is a great example here: why make users call hadoop jar
with some weird version number when it's now trivial to dynamically add commands at build
or configure time?

\[2] Very long term (post-3.x), it would probably be better if hive called hadoop-config.sh
and/or hadoop-functions.sh directly.  This would bypass the middleman and give much better
control.  I'd be very interested to hear what sort of holes we have in the functionality here
that makes this hard/impossible. Off the top, I suspect we need to make one big function of
the series of function calls in hadoop-config.sh, but would love to hear your insight on this.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -----------------------------------------------------------------
>                 Key: HADOOP-13335
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13335
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, HADOOP-13335.02_branch-2.patch,
HADOOP-13335.03.patch, HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' warning for
'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive uses it to
start all it's services (HiveServer2, the hive client, beeline etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of sense - there's
no relation to yarn other than requiring the classpath to include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN variables are
set (leave it in the help message), or adding a mechanism which would allow users to suppress

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message