hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-14498) HADOOP_OPTIONAL_TOOLS not parsed correctly
Date Tue, 06 Jun 2017 21:27:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039655#comment-16039655
] 

Allen Wittenauer edited comment on HADOOP-14498 at 6/6/17 9:26 PM:
-------------------------------------------------------------------

HADOOP\_OPTIONAL\_TOOLS basically triggers a read of "libexec/shellprofile.d/(whatever).sh"
that is created at build time by some maven magic and "dev-support/bin/dist-tools-hooks-maker"
.

The inside of this file (after cutting out the boiler plate) says, effectively:

{code}
function _hadoop-azure-datalake_hadoop_classpath
{
  if [[ -f "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-data-lake-store-sdk-2.1.4.jar"
]]; then
    hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-data-lake-store-sdk-2.1.4.jar"
  fi
  hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar"
}
{code}

ie, we're going to add azure-data-lake-store-sdk-2.1.4.jar and hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar
to the classpath.

hadoop-azure.sh, meanwhile, says:

{code}
  if [[ -f "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-storage-4.2.0.jar" ]];
then
    hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-storage-4.2.0.jar"
  fi
  if [[ -f "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-keyvault-core-0.8.0.jar"
]]; then
    hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-keyvault-core-0.8.0.jar"
  fi
  hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/hadoop-azure-3.0.0-alpha4-SNAPSHOT.jar"
{code}

ie., azure-storage-4.2.0.jar, azure-keyvault-core-0.8.0.jar, and hadoop-azure-3.0.0-alpha4-SNAPSHOT.jar.

Different dependencies are getting generated by the build and are either incorrect/incomplete
in the pom, a bug in the dependency file generator, or something else going haywire.  It is
not a bug in how HADOOP\_OPTIONAL\_TOOLS is getting parsed post-build.

You can actually verify this by using the 'hadoop classpath' command with different settings
in hadoop-env.sh for HADOOP\_OPTIONAL\_TOOLS and with/without --debug:

With just hadoop-azure-datalake:
{code}
$ bin/hadoop --debug classpath
...
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-archives.sh
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aws.sh
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure-datalake.sh
DEBUG: HADOOP_SHELL_PROFILES accepted hadoop-azure-datalake
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure.sh
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-distcp.sh
...
DEBUG: Initial CLASSPATH=/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/lib/*
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/*
DEBUG: Profiles: hadoop-azure-datalake classpath
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/azure-data-lake-store-sdk-2.1.4.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar
DEBUG: Profiles: hdfs classpath
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs/lib/*
...
/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/etc/hadoop:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/lib/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/azure-data-lake-store-sdk-2.1.4.jar:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs/lib/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/mapreduce/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/yarn/lib/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/yarn/*
{code}

You'll see both azure-data-lake-store-sdk-2.1.4.jar and hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar
present in the classpath.


was (Author: aw):
HADOOP\_OPTIONAL\_TOOLS basically triggers a read of "libexec/shellprofile.d/(whatever).sh"
that is created at build time by some maven magic and "dev-support/bin/dist-tools-hooks-maker"
.

The inside of this file (after cutting out the boiler plate) says, effectively:

{code}
function _hadoop-azure-datalake_hadoop_classpath
{
  if [[ -f "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-data-lake-store-sdk-2.1.4.jar"
]]; then
    hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-data-lake-store-sdk-2.1.4.jar"
  fi
  hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar"
}
{code}

ie, we're going to add azure-data-lake-store-sdk-2.1.4.jar and hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar
to the classpath.

hadoop-azure.sh, meanwhile, says:

{code}
  if [[ -f "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-storage-4.2.0.jar" ]];
then
    hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-storage-4.2.0.jar"
  fi
  if [[ -f "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-keyvault-core-0.8.0.jar"
]]; then
    hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/azure-keyvault-core-0.8.0.jar"
  fi
  hadoop_add_classpath "${HADOOP_TOOLS_HOME}/${HADOOP_TOOLS_LIB_JARS_DIR}/hadoop-azure-3.0.0-alpha4-SNAPSHOT.jar"
{code}

ie., azure-storage-4.2.0.jar, azure-keyvault-core-0.8.0.jar, and hadoop-azure-3.0.0-alpha4-SNAPSHOT.jar.

Different dependencies are getting generated by the build and are either incorrect/incomplete
in the pom, a bug in the dependency file generator, or something else going haywire.  It is
not a bug in how HADOOP\_OPTIONAL\_TOOLS is getting parsed post-build.

You can actually verify this by using the 'hadoop classpath' command with different settings
in hadoop-env.sh for HADOOP\_OPTIONAL\_TOOLS and with/without --debug:

With just hadoop-azure-datalake:
{code}
...
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-archives.sh
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aws.sh
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure-datalake.sh
DEBUG: HADOOP_SHELL_PROFILES accepted hadoop-azure-datalake
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure.sh
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-distcp.sh
...
DEBUG: Initial CLASSPATH=/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/lib/*
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/*
DEBUG: Profiles: hadoop-azure-datalake classpath
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/azure-data-lake-store-sdk-2.1.4.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar
DEBUG: Profiles: hdfs classpath
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs/lib/*
...
/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/etc/hadoop:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/lib/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/common/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/azure-data-lake-store-sdk-2.1.4.jar:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/tools/lib/hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs/lib/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/hdfs/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/mapreduce/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/yarn/lib/*:/Users/aw/H/hadoop-3.0.0-alpha4-SNAPSHOT/share/hadoop/yarn/*
{code}

You'll see both azure-data-lake-store-sdk-2.1.4.jar and hadoop-azure-datalake-3.0.0-alpha4-SNAPSHOT.jar
present in the classpath.

> HADOOP_OPTIONAL_TOOLS not parsed correctly
> ------------------------------------------
>
>                 Key: HADOOP-14498
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14498
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Mingliang Liu
>            Priority: Critical
>
> # This will make hadoop-azure not show up in the hadoop classpath, though both hadoop-aws
and hadoop-azure-datalake are in the classpath.{code:title=hadoop-env.sh}
> export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws,hadoop-azure-datalake"
> {code}
> # And if we put only hadoop-azure and hadoop-aws, both of them are shown in the classpath.
> {code:title=hadoop-env.sh}
> export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws"
> {code}
> This makes me guess that, while parsing the {{HADOOP_OPTIONAL_TOOLS}}, we make some assumptions
that hadoop tool modules have a single "-" in names, and the _hadoop-azure-datalake_ overrides
the _hadoop-azure_. Or any other assumptions about the {{${project.artifactId\}}}?
> Ping [~aw].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message